Why frozen test fixtures are a problem on large projects and how to avoid them
Posted by amalinovic 22 hours ago
Comments
Comment by matsemann 20 hours ago
Instead, if you're able to decouple the ORM from your application, with a separate layer, and instead pass plain objects around (not fat db backed models), one is much freer to write code that's "pure". This input gives that output. For tests like these one only needs to create whatever data structure the function desires, and then verify the output. Worst case verify that it called some mocks with x,y,z.
Comment by radanskoric 20 hours ago
In reality, that is also not free. It imposes some restrictions on the code. Sometimes being pragmatic, backing off from the ideal leads to faster development and quicker deliver of value to the users. Rails is big on these pragmatic tradeoffs. The important thing is that we know when and why we're making the tradeoff.
Usually I go with Rails defaults and usually it's not a problem. Sometimes, when the code is especially complex and perhaps on the critical path, I turn up the purity dial and go down the road you describe exactly for the benefits you describe.
But when I decide that sticking to the defaults the is right tradeoff I want to get the most of it and use Fixtures (or Factories) in the optimal way.
Comment by axelthegerman 20 hours ago
No language or abstraction is perfect but if someone prefers pure functional coding, Rails and Django are just not it, don't try to make them. Others like em just as they are
Comment by antonymoose 20 hours ago
Comment by jstanley 20 hours ago
Comment by antonymoose 19 hours ago
Nevertheless I’ve found far more God classes that could be refactored into clean layers than the other way around. Specifically in the context of Rails style web app as GP is specially discussing. Batteries included doesn’t necessarily require large tangled God classes. One can just as well compose a series of layers into a strong default implementation that wraps complex behavior while allowing one to bail-out and recompose with necessary overrides, for example reasonable mocks in a test context.
Of course this could then allow one to isolate and test individual units easily, and circle back with an integration test of the overall component.
Comment by onionisafruit 20 hours ago
Still, most of us work on code bases with design issues either of our own making or somebody else’s.
Comment by bluGill 19 hours ago
Fixtures done right ensure that everyone starts with a good standard setup. The question is WHAT state the fixture setups. I have a fixture that setups a temporary data directory with nothing in it - you can setup your state, but everything will read from that temporary data directory.
Unit tests do have a place, but most of us are not writing code that has a strong well defined interface that we can't change. As such they don't add much value since changes to the code also imply changes to the code that uses them. When some algorithm is used in a lot of places unit tests it well - you wouldn't dare change it anyway, but when the algorithm is specific to the one place that calls it then there is no point in a separate test for it even though you could. (there is a lot of grey area in the middle where you may do a few unit tests but trust the comprehensive integration tests)
> Worst case verify that it called some mocks with x,y,z.
That is the worst case to avoid if at all possible (sometimes it isn't) that a function is called is an implementation details. Nobody cares. I've seen too many tests fail because I decided to change a function signature and now there is a new parameter A that every test needs to be updated to expect. Sometimes this is your only choice, but mock heavy tests are a smell in general and that is really what I'm against. Don't test implementation details, test what the customers care about is my point, and everything else follows from that (and where you have a different way that follows from that it may be a good think I want to know about!)
Comment by abhashanand1501 19 hours ago
Foofactory() will automatically setup all the foreign key dependencies.
It can also generate fuzzy data, although having fuzzy data has its own issues in terms of brittle tests (if not done correctly).
Comment by orwin 19 hours ago
[edit] though in my case we have one fixture that load a json representation of our dev dynamodb into moto, and thus we mock internal data, but this data is still read through our data models, it doesn't really replace internal code, only internal "mechanics"
Comment by bluGill 19 hours ago
Comment by swader999 20 hours ago
Comment by ozim 20 hours ago
As much in applications code it is easy to curb, for test code it is just really hard to get people to understand all this duplication that should be there in tests is GOOD.
Comment by disgruntledphd2 19 hours ago
There'll always be some duplication, but too much makes it harder to see the important stuff in a test.
Comment by ozim 3 hours ago
Comment by bluGill 19 hours ago
I have lots of test fixtures each responsible for about 10 tests. It is very common to have 10-20 tests that share a startup configuration and then adjust it in various ways.
Comment by radanskoric 20 hours ago
I'm not sure what you mean by inheritance in tests but DRY is criminally overused in tests. That could be a whole separate article but the tradeoffs are very different between test and app code and repetition in the test code is much less problematic and sometimes even desirable.
Comment by swader999 13 hours ago
Comment by dkarl 19 hours ago
"Generators" for property-based testing might be similar to what the author is calling "factories." Generators create values of a given type, sometimes with particular properties, and can be combined to create generators of other types. (The terminology varies from one library to another. Different libraries use the terms "generators," "arbitraries," and "strategies" in slightly different and overlapping ways.)
For example, if you have a generator for strings and a generator for non-negative integers, it's trivial to create a generator for a type Person(name, age).
Generators can also be filtered. For example, if you have a generator for Account instances, and you need active Account instances in your test, you can apply a filter to the base generator to select only the instances where _.isActive is true.
Once you have a base generator for each type you need in your tests, the individual tests become clear and succinct. There is a learning curve for working with generators, but as a rule, the test code is very easy to read, even if it's tricky to write at first.
Comment by radanskoric 19 hours ago
The problem arises when they're used to generate Database records, which is a common approach in Rails applications. Because you're generating a lot of them you end up putting a lot more load on the test database which slows down the whole test suite considerably.
If you use them to generate purely in memory objects, this problem goes away and then I also prefer to use factories (or generators, as you describe them).
Comment by tclancy 19 hours ago
Comment by radanskoric 15 hours ago
Unfortunately, I'm not aware of a good property based testing library in Ruby, although it would be useful to have one.
Even so I'm guessing that property based testing in practice would be too resource intensive to test the entire application with it? You'd probably only test critical domain logic components and use regular example tests for the rest.
Comment by dkarl 19 hours ago
Comment by strehldev 19 hours ago
My rule was to randomize every property by default. The test needs to specify which property needs to have a certain value. E.g. set the address if you're testing something about the address.
So it was immediately obvious which properties a test relied on.
Comment by dkarl 19 hours ago
A clarification on terminology, the "property" in "property-based testing" refers to properties that code under test is supposed to obey. For example, in the author's Example 2, the collection being sorted is the property that the test is checking.
Comment by FuckButtons 19 hours ago
Comment by japhyr 19 hours ago
Comment by dkarl 19 hours ago
Comment by RHSeeger 21 hours ago
> This test has just made it impossible to introduce another active project without breaking it, even if the scope was not actually broken. Add a new variant of an active project for an unrelated test and now you have to also update this test.
And then goes on to test that the known active projects are indeed included in what the call to Project.active returns.
However, that doesn't test that "active scope returns active projects". Rather, it tests that
- active scope returns _at least some of the_ active projects, and
And it does not test that
- active scope returns _all_ of the active projects
- active scope does not return non-active projects
Which, admittedly, is only different because the original statement is ambiguous. But the difference is that the test will pass if it returns non-active projects, too; which probably is not the expected behavior.
I prefer to set things up so that my test fixtures (test data) are created as close to the test as possible, and then test it in the way the article is saying is wrong (in some cases)... ie, test that the call to Project.active returns _only_ those projects that should be active.
Another option would be to have 3 different tests that test all those things, but the second one (_all_ of the active projects) is going to fail if the text fixture changes to include more active projects.
Comment by jon-wood 20 hours ago
Comment by radanskoric 20 hours ago
The "doesn't include non-active projects objections is easy", please check the Example 1 test again, there's a line for that:
``` refute_includes active_projects, projects(:inactive) ```
Hm, if you missed it, perhaps I should have emphasised this part more, maybe add a blank line before it ...
Regarding the fact that the test does not check that the scope returns "all" active projects, that's a bit more complex to address but let me let tell you how I'm thinking about it:
The point of tests is to validate expected behaviours and prevent regressions (i.e. breaking old behaviour when introducing new features). It is impossible for tests to do this 100%. E.g. even if you test that the scope returns all active projects present in the fixtures that doesn't guarantee that the scope always returns all active projects for any possible list of active projects. If you want 100% validation your only choice is to turn to formal proof methods but that's whole different topic.
You could always add more active project examples. When you write a test that is checking that "Active projects A,B and C" are returned that is the same test as if your fixtures contained ONLY active projects A,B and C and then you tested that all of them are returned. In either case it is up to you to make sure that the projects are representative.
So by rewriting the test to check: 1. These example projects are included. 2. These other example projects are excluded.
You can write a test that is equally powerful as if you restricted your fixtures just to those example projects and then made an absolute comparison. You're not loosing any testing power. Expect you're making the test easier to maintain.
Does that make sense? Let me know which part is still confusing and I'll try to rephrase the explanation.
Comment by RHSeeger 20 hours ago
> The "doesn't include non-active projects objections is easy", please check the Example 1 test again, there's a line for that:
You're correct; I totally missed that.
> In either case it is up to you to make sure that the projects are representative.
That's fair, but that's also the point you're trying to address / make more robust by how you're trying to write tests (what the article is about). Specifically
- The article is about: How to make sure you're tests are robust against test fixtures changing
- That comment says: It's up to you to make sure your test fixtures don't change in a way that breaks your tests
> You can write a test that is equally powerful as if you restricted your fixtures just to those example projects and then made an absolute comparison. You're not loosing any testing power. Expect you're making the test easier to maintain.
By restricting your fixtures to just the projects (that are relevant to the test), you're making _the tests_ easier to maintain; not just the one test but the test harness as a whole. What I mean is that you're reducing "action at a distance". When you modify the data for your test, you don't need to worry about what other tests, somewhere else, might also be impacted.
Plus you do gain testing power, because you can test more things. For example, you can confirm it returns _every_ active project.
All that being said, what I'm talking about relies on creating the test data local to the tests. And doing that has a cost (time, generally). So there's a tradeoff there.
Comment by radanskoric 19 hours ago
> Plus you do gain testing power, because you can test more things. For example, you can confirm it returns _every_ active project.
Imagine this:
1. You start with some fixtures. You crafted the fixtures and you're happy that the fixtures are good for the test you're about to write.
2. You write a test where you assert the EXACT collection that is returned. This is, as you say, a test that "confirms the scope returns _every_ active project".
3. You now rewrite the test so that it checks that the collection includes ALL active projects and excludes all inactive projects.
Do you agree that nothing changed when you went from 2 to 3? As long as you don't change the fixtures, those 2 version of the test will behave exactly the same: if one passes so will the other and if one fails so will the other. As long as fixtures don't change they have exactly the same testing power.
If you agree on that, now imagine that you added another project to the fixtures. Has the testing power of the tests changed just because fixtures have been changed?
Comment by RHSeeger 19 hours ago
No, _but_ (and this is a big _but_) you're not testing the contract of the method, which (presumably) is to return all and only active projects.
Testing that it returns _some_ of the active methods is useful, but there are cases where it won't point out an issue. For example, image
- Over time, more tests are added "elsewhere" that use the same fixtures
- More active projects are added to the fixture to support those tests
- The implementation in the method is changed to be faster, and an off-by-one error is introduced; so the last project in the list isn't returned
In that ^ case, testing that _some_ of the active projects are returned will still return true; the bug won't be noticed.
Not directly related to the above, but I'll note that I would also split 2/3 into different tests.
- Make sure all projects returned are active
- Make sure projects returned includes all active projects
I think that's more of a style thing, but I _try_ to stick to each test testing one and only one thing. I don't always do that, but it's a rule of thumb for me.
Comment by radanskoric 15 hours ago
Regarding the fact that I'm not fully testing the contract of the method, you're absolutely correct. But also, no example based test suite is fully doing that. As long as the test suite is example based it is always possible to find a counter-case where the contract is violated but the test suite misses it.
These counter-cases will be more contrived and less likely the better the test suite. So all of us at some point decide that we've done enough and that more contrived cases are so unlikely and the cost of mistake is so small that it's not worth it to put in the extra testing effort. Some people don't explicitly think about it but that decision is still made one way or another.
This is a long way of saying that I both agree with you but that also, in most cases, I would still take the tradeoff and go for more maintainable tests.
Comment by sceptic123 18 hours ago
Comment by radanskoric 15 hours ago
Comment by jillesvangurp 20 hours ago
This means that my test can't depend on the database to be in some known state or assume to have exclusive access to that database. And for example modify anything that might be used by another test. They can only modify things that are specific to that test.
Most of my tests work around this limitation by either just creating their own teams, users, and other objects they need with randomized ids; or in some cases deferring their execution until some bit of logic with lock has created some shared data that then is never modified.
Instead of hard coded IDs, I tend to use randomized ids (UUIDs typically). I have a person data generator that gives me human readable names, email addresses, etc. Randomized data like this avoids test modifying each other's data.
As an example, we have a few tests for an analytics dashboard that locks on a bit of expensive code that creates a lot of content via our APIs to do analytics on. The scenario is quite elaborate and uses a few factories, known timestamps, etc. If I refactor my data model, my factories are also refactored. Using a lock ensures that data is initialized only once. Once that is done, there are a bunch of test that that test different queries against that.
You might think that all this is slow. It's not. I have about 380 integration tests like this that run in under 30 seconds on my laptop (which has a lot of CPU cores). Having this as a safety net is very empowering. I've been on teams that had less tests where running them took ten or more minutes. This I can do quickly before committing.
Testing like this has many advantages. But one includes easy to maintain tests. I put some effort into usable test data factories. The "when" part of a BDD style integration tests is usually most of the work. So, by making that as easy as I can, I lower the barrier for writing more tests. And using all my cpu cores, minimizes the impact new tests have on execution time to the point where I don't worry about that.
Another is that for big structural changes my tests continue to work if I just fix their shared factories to do the right thing usually.
Comment by tclancy 19 hours ago
The answer is most definitely, 100%, with no room for argument, to not speak so assuredly, acknowledge other people have the right to think differently and find synthesis and/ or a set of heuristics that apply for given cases.
But this is the Internet, and we need to be arguing PS2 vs X-Box for the rest of our lives, so have at it.
(Me? Factories are great until they aren't, which may not happen if a project or a team is small enough. Generators are great but do have some footguns and I would love to hand over everything to property-based testing, but I _feel_, without any experimenting or trying, they resist anything other than the purest of pure unit tests and can't help with integration tests that much.)
Comment by jrochkind1 21 hours ago
Comment by radanskoric 20 hours ago
Btw, I also have an article with some of my learnings using factories and I make a remark on how it helps with test speed: https://radanskoric.com/articles/test-factories-principal-of...
Comment by jrochkind1 20 hours ago
While I see the pro's (and con's) of fixtures, one thing I do _not_ like is Rails ordinary way of specifying fixtures, in yaml files. Especially gets terrible for associations.
It's occured to me there's no reason I can't use FactoryBot to create what are actually fixtures -- as they will be run once, at test boot, etc. It would not be that hard to set up a little harness code to use FactoryBot to create objects at test boot and store them (or logic for fetching them, rather) in specified I dunno $fixtures[:some_name] or what have you for referal. And seems much preferable to me, as I consider switching to/introducing fixtures.
But I haven't seen anyone do this or mention it or suggest it. Any thoughts?
Comment by onionisafruit 20 hours ago
Read-only tests only need to run the bootstrap code if their particular fixture hasn’t been created on that machine before. Same with some tests that write data but can be encapsulated in a transaction that gets rolled back at the end.
Some more complex tests need an isolated db because their changes can’t be contained in a db transaction (usually because the code under test commits a db transaction). These need to run the fixture bootstrap every time. We don’t have many of these so it’s not a big deal that they take a second or two. If we had more we would probably use separate, smaller fixtures for these.
Comment by radanskoric 20 hours ago
So you can definitely use FactoryBot to create them. However, the reason I think that's rarely done is that you're pretty likely to start recreating a lot of the features of Rails fixtures yourself. And perhaps all you need to do is to dynamically generate the yaml files. Rails yaml fixtures are actually ERB files and you can treat is an ERB template and generate its code dynamically: https://guides.rubyonrails.org/testing.html#embedding-code-i...
If that is flexible enough for you, it's a better path since you'll get all the usual fixture helpers and association resolving logic for free.
Comment by jrochkind1 19 hours ago
I feel like i don't _want_ the association resolving logic really, that's what I don't like! And if it's live ruby instead of YAML, it's easy to refer to another fixture object by just looking it up as a fixture like normal? (I guess there's order of operation issues though,hm).
And the rest seems straightforward enough, and better to avoid that "compile to yaml" stage for debugging and such.
We'll see, maybe I'll get around to trying it at some point, and release a perversely named factory_bot_fixtures gem. :)
Comment by yxhuvud 19 hours ago
Comment by mnutt 20 hours ago
Comment by jrochkind1 19 hours ago
Then you just refer to the fixture in your factory definitions? Seems very reasonable.
Comment by mijoharas 20 hours ago
(now maybe that's what you used to see what was causing the slowdown, but mentioning to for others to help them identify the bottlenecks.)
Comment by erdaniels 20 hours ago
100% agree with "Test only what you want to test".
Comment by stephen 19 hours ago
The article doesn't mention what I hate most about fixtures: the noise of all the other crap in the fixture that doesn't matter to the current test scenario.
I.e. I want to test "merge these two books" -- great -- but now when stepping through the code, I have 30, 40, 100 other books floating around the code/database b/c "they were added by the fixture" that I need to ignore / step through / etc. Gah.
Factories are the way: https://joist-orm.io/testing/test-factories/
Comment by radanskoric 15 hours ago
Personally, I even slightly prefer to use Factories and I also previously wrote about a better way to use them: https://radanskoric.com/articles/test-factories-principal-of...
Comment by perlgeek 19 hours ago
You can also supply defaults and name schemes for individual columns.
For business logic, I prefer to have it structured in a way that it doesn't need the database for testing, but loading and searching stuff from the DB also needs to be tested, and for those, mixer strikes a really good balance. You only need to specify the attributes that are relevant for the test, and you don't need shared fixtures between many tests.
Comment by onionisafruit 20 hours ago
They are also nice because I don’t have to think so much about assertions. They automatically assert the response is exactly the same as before.
Comment by radanskoric 20 hours ago
But how would you do snapshot testing for behaviour? I'm approaching the problem primarily from the backend side and there most tests are about behaviour.
Comment by onionisafruit 19 hours ago
Comment by radanskoric 19 hours ago
But, I spend very little or no time on API endpoints since I don't work on projects where the frontend is an SPA. :)
Comment by Fire-Dragon-DoL 18 hours ago
The data created by the fixtures shouldn't be touched, or factories are being used, like the author suggested
Comment by Tknl 21 hours ago
Comment by orwin 19 hours ago
We have a solution. Not sure if it is elegant, but use it as an inspiration: it works.
When our project run its test, it will generate its database json representation itself (only using its models) with a file that contain fake/test data. That database representation will be loaded in the dev environment, and also in the database fixture that then run our tests. If our tests pass and we have an issue in dev, that mean our test missed something (that happen waaaaaay more often that i like to admit) and we have to add them.
Forcing every test to use this representation also force us to have a dev environment that contain enough items to run the test, and we can't forget to generate an item in the dev database, since that would mean our new feature isn't tested.
Comment by radanskoric 21 hours ago
Comment by immibis 20 hours ago
"assert_equal names, names.sort" is a wrong answer. It would accept an empty collection.
Comment by bluGill 19 hours ago
I have a fixture that sets our database to the initial install state. This works for me because we are an embedded system where every month we ship a bunch more new systems and so code needs to see that initial install state, if we change the initial state (which we do all the time) and a test breaks we want to know and fix that since customers will see that situation.
However if you run on a server in a data center I could well believe you will never again see any specific state and so a fixture probably isn't right. Maybe ideally every test would take a snapshot of your current production database and test against that (with whatever additional data you add for the test) - if a customer enters data that breaks a test that is a "all hands on deck" to fix the code before customers hit that code path. Maybe - I don't work in this space and so I'm just speculating what you need.