Be Genius

me

Bo Jeanes

I am an Australian full-stack developer living and working in San Francisco. I strongly advocate open-source software and the community around it. I am a tool maker. I currently work primarily in Ruby and Rails but do as much Clojure, Go, and Javascript as I can.

Factories breed complexity

Having maintainable code is great. Maintainable code allows you to deliver improvements faster, happier, and more reliably.

Furthermore, the measures that developers need to take and the strategies that we have to employ to achieve maintainable code have been understood for years, if not decades. Especially in the realm of object-oriented programming, but certainly not exclusively, most of these principles boil down to reducing coupling and system complexity. A system whose parts are coupled as loosely as possible is a modular system; the parts know little of each other and a lot about themselves and they have thin and specific interfaces between each other.

Test-driven development is one of the many tools at a developer's disposal to achieve code quality. Unfortunately, there is a lot of naïveté around the benefits of TDD. A lot of developers see TDD as primarily a tool for verifying system correctness. While TDD does of course offer this benefit, and arguably better than retroactive automated testing, the real benefit of TDD is that it offers short feedback loops that guide the design/architecture of the system.

Since it is accepted that a loosely-coupled modular system is a simpler system, it stands that tools, such as TDD, which guide a design towards modularity and simplicity are good tools. A module that is tightly coupled to another is not easily tested in isolation. However, if the isolated tests are written first, it is difficult to write a passing implementation for that module that maintains such a low degree of coupling. Therefore, good TDD should guide you towards a simpler design (though it is certainly not the only way).

Unfortunately, factories work against this goal. Factories debilitate TDD's ability to give you feedback into the complexity of your design.

To be clear, I am not talking about the Factory Method Pattern or the Abstract Factory Pattern — both of which can be described as ways to decouple a particular implementation of an object from code for the creation of such an object (Wikipedia). Instead, I am talking about the "factories" for replacing fixtures in tests — something which has seemingly obsessed the Ruby (Rails, especially) community. The two primary Ruby libraries for factory-based fixture replacement are factory_girl and Machinist.

A tool such as Machinist or factory_girl generates data for the attributes you don't care about, and constructs any necessary associated objects, leaving you to specify only the fields you care about in your test (from Machinist's own README file). This sounds nice at first, because it makes your tests more readable and relevant. However, behind the scenes, these tools are still creating other objects and entities and introducing them into your test environment. By having data and objects in tests that are irrelevant to the functionality that is being tested (in isolation, remember), a developer creates an environment that permits, if not invites, silent dependencies to creep into an implementation.

Furthermore, and perhaps more significantly, by creating objects (and usually entire hierarchies of objects) with such ease and opacity, you are outright masking the dependencies (*cough* complexity *cough* coupling) between your implementation and those entities. If forced to stub out all those intricacies, the system complexity would be screamingly obvious and a developer would quickly avail herself of a rewrite to reduce complexity or thin out the interface.

Instead of having feedback that guides a developer to simplicity, fixture factories seem to guide developers to complexity by masking dependencies as one-line simplicity. In fact, that one (or five, whatever) line setup is a shotgun blast of environmental dependencies that are hidden from the architect. That complexity will come back for revenge after being ignored for so long.

It seems that factory_girl and Machinist exist to make testing components more convenient. This is, at face value, an admirable and desirable goal. However, in unit tests, the cost is too high for any system of considerable size.

Please, do the right thing and avoid the convenience and "fun" of the factory_girl temptress. You will trend towards a simpler system and as a bonus (in fact, an incredible one) your test suite will likely be exponentially faster which, in turn, will breed simplicity by letting you have more feedback more often.

P.S. It shouldn't go without mention that factories can be absolutely awesome for integration tests. Integration tests aren't used for guiding system design nor testing in isolation so the drawbacks of these tools drop away. However, both factory_girl and Machinist use RSpec as some of their very first usage examples and this troubles me deeply.

P.P.S. A lot of these arguments can be applied to fixtures too. However, they usually don't create hordes of objects invisibly and litter your environment with them. Also, they aren't as slow. But yes, the fewer factories and fixtures in a test, the better.

Update for clarity: Firstly, I am absolutely just talking about unit tests. If you are testing code that integrates with ActiveRecord or number of levels of your stack, then factories and fixtures are certainly defensible (though I still prefer to steer clear). Secondly, I've tried to be careful about where I use the words simple, easy, complex, and difficult. For the definitions that I intend, please watch (at least the first 10 minutes of) Simple Made Easy.

Comments

  1. what do you think of using factories in Cucumber?.

    by diego on
  2. Not really convinced. If the tool's primary use case is problematic, then that argument is probably worth making (I feel this way about dynamic mocking -- e.g. RSpec's mocks). But if it is merely possible to abuse and many people do end up abusing it, then saying "don't use that tool" doesn't seem like a solution at all:

    So they will take out FactoryGirl and instead build those associations by hand, it will be fragile and break all the time because they set up their dependencies in each test they need them (as opposed to one authoritative location), their tests will be incomprehensible because of all the lines devoted to setup, they will be frustrated and stop testing.

    Clearly using FactoryGirl poorly is better than not using it at all. People use it this way because it is better than what I just described. If you think they should use it differently, or do something differently, then saying "don't use that tool" isn't helpful, it's actually worse.

    Instead, show them examples of what they are doing and what would be a better way, and explain why it is a better way. Then, when they feel these pains (and they will, with or without FactoryGirl), they will have something to do about it other than add more factories. And once this is understood, then the tool can be used appropriately -- or discarded as unnecessary.

    As a metaphor: If I'm taking aspirin because I've been shot, telling me to stop taking aspirin will just cause me pain, instead, show me how to get to the hospital.

    by Josh Cheek on
  3. I totally agree with Josh Cheek on this one. Also, the article in itself is educational and interesting but totally dismissing factories in tests is too extreme.

    What about factories in tests written by the programmers who wrote the tests? I mean, let's not use factory_girl or whatever else prefabricated in a gem, but instead use a more specialized factory written by the same person who writes the tests. What's your opinion about this? Factories and builders are a great way to avoid duplication in complex test fixtures or setups. Yes, you can abuse them, as you can abuse any other principle or technique, but that does not mean the technique is flowed, it means the user is irresponsible.

    As a metaphor: In some countries you can own a gun and use it for self defense or be punished otherwise. In other countries it is forbidden to owe any kind of weapon. Does this mean the gun is faulty or the user is irresponsible?

    by Patkos Csaba on
  4. I disagree. While factories can get you in trouble, the key is to use them correctly. Here are some suggestions.

    The base factory should be just enough to get validations passing and never contain specific data that is used by a test.

    If you need an association, make a separate factory for it (such as comment_with_article) but keep the base factories free of associations unless they are required to make a valid record.

    Don't attempt to get fancy, such as building has_many associations with factories. Any factory overly complex should be moved up into the test so it is clear what it is doing.

    Always use "build" instead of "create" in factory girl when possible. This will lead to faster, lighter tests.

    Follow these simple rules and factories are awesome. Fixtures hide data important to a test and can quickly become a mess of attributes and associations which test are dependent on. Tests should avoid external dependencies whenever possible.

    I don't consider a factory an external dependency if it does not contain data that makes a test pass. It is meant to be a clean slate just to get validations and attr_accessible working in Active Record.

    by Ryan Bates on
  5. Josh,

    So they will take out FactoryGirl and instead build those associations by hand, it will be fragile and break all the time because they set up their dependencies in each test they need them (as opposed to one authoritative location), their tests will be incomprehensible because of all the lines devoted to setup, they will be frustrated and stop testing.

    This is the very pain that you're supposed to be listening to! TDD with isolated unit tests puts design pressure on your system.

    by Steve Klabnik on
  6. Hi, Steve,

    The problem I see is that telling someone to listen to their tests isn't identifiably actionable. In fact, they probably use factories because they were listening to their tests, factories eased the pain of building objects that needed other information that was irrelevant to the particular test they wanted (IMO, embedding validations in the model is the biggest culprit). I think some concrete examples of a refactoring from a complex factory sustained design to something more modular would go much further.

    As an aside, I've been thinking about this today, and wondering if there is merit to the original argument. Factories get complex because AR associations get complex (this may be presumptuous, but I've only ever seen Machinist and FactoryGirl used for ActiveRecord models). I usually see ActiveRecord models as data structures with associations, which means I avoid placing behaviour on them. So if the factories are complex, doesn't that mean that the associations and domain are complex? And is there really a way around this? If my factory invisibly builds an address and country every time I build a user, what is the better way? (this assumes validations fail if you try to save user w/o an address). Users still need to live at addresses which are still located in countries. So in what way would not using a factory lead me to reduce the complexity of this domain?

    In partial answer to my above question, my first thought is that you can then pass mock objects to the classes that operate on the models. I like this answer, but it has a problem that AR models have a positively enormous API (not to mention I get really uncomfortable with faking out persistent objects -- there are a lot of unexpected pitfalls to persistence such that I feel much more comfortable using real AR objects).

    by Josh Cheek on
  7. In fact, they probably use factories because they were listening to their tests, factories eased the pain of building objects that needed other information that was irrelevant to the particular test they wanted

    Then they solved the pain in the wrong way. ;)

    Users still need to live at addresses which are still located in countries. So in what way would not using a factory lead me to reduce the complexity of this domain?

    The User should not need to know about the Country. If you need a factory to build this relationship, it means that you're reaching across boundaries: User -> Address -> Country. Your User should never touch a country directly, only through an address. So you mock and/or stub out the Address to isolate your User.

    it has a problem that AR models have a positively enormous API

    This is one reason that AR is a pretty poor pattern.

    by Steve Klabnik on
  8. The User should not need to know about the Country. If you need a factory to build this relationship, it means that you're reaching across boundaries: User -> Address -> Country.

    Yes, but you can't save your user without an address, and you can't save an address without a country, and none of these exist because your database starts clean and gets rolled back after each test. So again, validations seem like the real culprit rather than factories.

    by Josh Cheek on
  9. Addendum to the above, if validations sat outside the model (and you could compose them for any given object that will be saving this model -- possibly through a decorator scheme where you can stack only the validations relevant to this particular use case, but in production, stack them all), then factories wouldn't need to be so complex, they would actually be quite helpful. Where you wanted a user you could just FactoryGirl.build :user and where you wanted a user with an address FactoryGirl.build :user_with_address, and they would both work fine without building out huge dependency graphs.

    So I guess it seems to me that factories are the aspirin people take to alleviate the problems caused by validations in the model (meaning the code will be complex w/ or w/o factories, but once validations are extracted, it could then be narrowed down).

    by Josh Cheek on
  10. In a current project, we're using factories similar to how Ryan Bates mentions above. We're not using the active record pattern, so we don't need a database during testing and associations are rarely needed (and if they are, we stub rather than do anything with factories)

    This means we can use factories as "dynamic fixtures", building quick objects when we need them and throwing them away afterward. This seems like a not-too-horrible use of factories.

    by Derek Reeve on
  11. Diego, I have no real opinions or objections to using factories or fixtures in Cucumber or other integration tests. In my opinion, integration tests are more about validating the correctness of a system than the its design, so the negative aspects mentioned above don't as readily apply.

    by Bo Jeanes on
  12. Josh Cheek,

    I'm going to comment on a bunch of your statements across multiple comments to save time.

    So they will take out FactoryGirl and instead build those associations by hand, it will be fragile and break all the time because they set up their dependencies in each test they need them (as opposed to one authoritative location), their tests will be incomprehensible because of all the lines devoted to setup, they will be frustrated and stop testing.

    I agree that the tests will be brittle if you have to mock out the extraneous dependencies. However, the point of this article is that it isn't the effort in testing that is the problem, it's the fact that your design requires those extraneous dependencies. Factories (and fixtures) do you a disservice by hiding that from you. The correct solution is to factor your code better by pulling apart different responsibilities and testing them in true isolation.

    Clearly using FactoryGirl poorly is better than not using it at all. [...] Saying "don't use that tool" isn't helpful, it's actually worse.

    Actually, I don't think this is self-evident. It is not clearly better to use the tools poorly than not at all. Using them poorly is far worse than not using them at all. I'm not saying that factory tools don't help and therefore you shouldn't use them. I'm saying they can be actively damaging to your design (Ryan Bates talks about ways to counter that above, and I'll address that in a separate comment).

    Instead, show them examples of what they are doing and what would be a better way, and explain why it is a better way.

    Yes, this is true. I point out the problems but don't offer solutions. Unfortunately, I did write this post train of thought and a bit ranty and it wasn't an attempt to offer the Right Way™. I plan to address this specifically in a future post.

    Regarding your association of the pain being in associations, I don't agree that it is limited to associations or even predominantly there. While AR (as a pattern and a gem) certainly is a major pain point for coupling and factories, I maintain that these principles apply to any plain old (Ruby/Java/anything) object, too. It has everything to do with coupling and cohesion — CS101, almost.

    Especially in a dynamic language like Ruby, where IDEs won't help you and all classes/constants are globally available to everything else, anything that masks the dependencies between units of your code will, on average, have a negative impact on the design of your system, according to the practices and principles acknowledged as "good" my the majority of the software industry.

    The problem I see is that telling someone to listen to their tests isn't identifiably actionable.

    You are right that isn't immediately actionable to those who don't already know how to understand what tests are telling them — but those people are only getting half the benefit of TDD and should learn.

    In fact, they probably use factories because they were listening to their tests, factories eased the pain of building objects that needed other information that was irrelevant to the particular test they wanted.

    I would agree that they were reacting to their tests when adding factories, but I disagree that they were listening. Your aspirin metaphor is actually very telling. Adding factories are easing the symptoms, not the problems (that is, your code is probably too coupled).

    I think some concrete examples of a refactoring from a complex factory sustained design to something more modular would go much further.

    Great idea. I'll have a think about this.

    So if the factories are complex, doesn't that mean that the associations and domain are complex?

    Possibly, but the inverse isn't necessarily true — and that's the problem. If your domain is complex, your factories might still be simple one-liners so things seem simple (actually, they're just easy — see Simple Made Easy for more on being precise about these words).

    by Bo Jeanes on
  13. Christopher G,

    That's not the most constructive comment, but I suspect you are confusing complexity with difficulty. In any case, fixtures are not simpler than factories, but they can be less destructive. I think using either is a smell.

    Introducing objects (or records in a database) is not my complaint (though it's an unfortunate side effect), it's that you lose transparency into the coupling of your design.

    by Bo Jeanes on
  14. Patkos Csaba,

    My complaint is not about those gems, it's about the philosophy behind them. If the programmer wrote the factories, I don't think this changes anything. The fact is that your system requires different components is not reason to build something to satisfy those dependencies conveniently, it's (usually) reason to consider whether those dependencies are truly justified. And yes, sometimes they really are, but almost always less so than most engineers think.

    by Bo Jeanes on
  15. Ryan Bates,

    First of all, I love the screencasts and thanks for stopping by!

    Your suggestions definitely help to reduce the damage but it's not clear how it makes it non-existent.

    If you need an association, make a separate factory for it (such as comment_with_article) but keep the base factories free of associations unless they are required to make a valid record.

    In my opinion, this is totally untenable in an application of considerable size. Of course, it does help you use leaner factories where possible, but my argument isn't one for having fast tests (though this is, obviously, an awesome side effect) but one for having simple systems. It doesn't matter that your factories are leaner than they could be, it's hiding a dependency between your logic under test and the state of the database and dependent objects or simply just your library (e.g. ActiveRecord, as is often the case).

    Always use "build" instead of "create" in factory girl when possible. This will lead to faster, lighter tests.

    This helps with speed, and if you have to use a factory, definitely do this, but it doesn't address my argument of complexity.

    Fixtures hide data important to a test and can quickly become a mess of attributes and associations which test are dependent on. Tests should avoid external dependencies whenever possible.

    Yes — fixtures do, and so do factories. Fixtures are more awkward, but in the same way as DRY shouldn't be a goal in tests, nor should convenience. Factories make hiding important data too convenient. They, like fixtures, also have the baggage of having irrelevant data (though, still hidden) pulled into the state of your object under test, which can cause very subtle and frustrating bugs that could not exist if you precisely controlled all the dependencies — and could see them all, everywhere.

    I don't consider a factory an external dependency if it does not contain data that makes a test pass. It is meant to be a clean slate just to get validations and attr_accessible working in Active Record.

    If you were only testing logic that was, by definition, related to persistence or ActiveRecord functionality, then this benefit is worth it and I think using factories or fixtures here is defensible. However, these are not unit tests; they are functional at best, if not integration tests. I suspect I wasn't as clear as I should have been on this aspect, but my points are very specifically about unit testing, not testing in general. That is, tests that inform you of your system design, not your system correctness.

    by Bo Jeanes on
  16. Regarding your association of the pain being in associations, I don't agree that it is limited to associations or even predominantly there. While AR (as a pattern and a gem) certainly is a major pain point for coupling and factories, I maintain that these principles apply to any plain old (Ruby/Java/anything) object, too. It has everything to do with coupling and cohesion — CS101, almost.

    I don't understand, can you give an example? I usually think of ActiveRecord models as simple data structures, like hashes, perhaps I need to consider them as having domain specific behaviour for this point to make sense?

    Especially in a dynamic language like Ruby, where IDEs won't help you and all classes/constants are globally available to everything else, anything that masks the dependencies between units of your code will, on average, have a negative impact on the design of your system, according to the practices and principles acknowledged as "good" my the majority of the software industry.

    What dependencies do your models have besides AR and their associated objects? Perhaps we are talking about plugins like CarrierWave? (honestly, I'm ambivalent about these kinds of things)

    You are right that isn't immediately actionable to those who don't already know how to understand what tests are telling them — but those people are only getting half the benefit of TDD and should learn.

    I suppose we see this from different perspectives. You think the pain will help them find the right way to do it. I think the pain will make them quit testing, or will make them dramatically less productive with a nightmarish codebase (probably culminating in a rewrite).

    Possibly, but the inverse isn't necessarily true — and that's the problem. If your domain is complex, your factories might still be simple one-liners so things seem simple (actually, they're just easy — see Simple Made Easy for more on being precise about these words).

    A great talk, but I realize that I think factories are simpler. When you have to perform setup in your tests, you are interleaving the knowledge about the identity and construction of the objects in every place you perform it. You are also doing several things: building objects, performing actions, and checking results. The building objects is problematic because that is implicit by the context of the spec. Now you have two authorities on what this test is about, the context/name of the test, and the setup code necessary to perform the test. If we abstract this setup code out, on the other hand (e.g. in a factory), then we can focus on the specific piece of information that is relevant to this test. This is both easier and simpler (e.g. FactoryGirl.build :underage_user is simpler than User.new age: 12 -- where the reason for specifying age is unclear, and the relevance of 12 requires you to juggle settings values with attribute names and types and domain rules). This is why, for example, I move as much of my setup and values into descriptive let blocks. There are usually only one or two lines of code that I'm actually interested in when I look at a test.

    by Josh Cheek on
  17. I completely agree.

    My response to the vast majority of support question I get for Machinist is, "You're using it for what!!??" Often someone is trying to create a giant tree of objects in a unit test, has some tricky association magic four levels deep in their object hierarchy, and is hoping Machinist will save them.

    If you're unit testing properly (i.e. your tests are isolated), you should have no use for Machinist in unit tests. Machinist is very good at masking poor interface design at this level.

    Integration tests are the place for factories, and then primarily for speed. Ideally integration tests should only hit your UI, but if you had every one of your integration tests go through your sign up process to create a user (for example), it would cost you a lot of performance, and not gain you much confidence in your code. A factory fits in well here.

    Good article! I'm tempted to link to it from the Machinist readme. :)

    by Pete Yandell on
  18. Josh Cheek,

    Some comments on a few of your points:

    A great talk, but I realize that I think factories are simpler.

    No, they are easier.

    You are also doing several things: building objects, performing actions, and checking results.

    I'd say you're confusing complexity with cardinality.

    The building objects is problematic because that is implicit by the context of the spec. Now you have two authorities on what this test is about, the context/name of the test, and the setup code necessary to perform the test.

    That's always the case, one just hides that responsibility from you so that when it changes elsewhere, the intent of the original test can no longer be trusted.

    If we abstract this setup code out, on the other hand (e.g. in a factory), then we can focus on the specific piece of information that is relevant to this test.

    I do think the idea of having the relevant pieces of information to your test highlighted is an important point, but there are many other ways to do this, as you point out yourself (let, subject, etc).

    FactoryGirl.build :underage_user is simpler than User.new age: 12 -- where the reason for specifying age is unclear, and the relevance of 12 requires you to juggle settings values with attribute names and types and domain rules.

    This is not simplicity. It's better readability, maybe, and it's definitely more convenient — but, again, convenience is really just another word for ease, not simplicity.

    This is why, for example, I move as much of my setup and values into descriptive let blocks. There are usually only one or two lines of code that I'm actually interested in when I look at a test.

    Yes, that's great. Do whatever you can for test readability and maintainability — as long as you aren't compromising their reliability. Tests guide your design as well as your functionality and fixtures/factories put you at risk of losing the first benefit.

    by Bo Jeanes on
  19. If the only insight you have into the complexity of your app is through your unit tests, than you have at least 2 problems. Fixing the complexity of your app by requiring greater complexity in your unit tests is not a solution to any of them.

    by carpeliam on
  20. I agree that if it's your only insight into your complexity, that's a problem. However, factories and fixtures don't make your tests any less complex, it just hides the complexity — which is worse. If you strive to maintain visibility in your tests (and everywhere, really), then you'll help guide your design to a place where neither the tests nor the code is complex. That's kind of the point.

    by Bo Jeanes on
  21. I really liked your post. I agree with some in the comments, however IMO there is no strict rule for this. It all depends.

    For example: I agree with some that when you leave all the test data in a single place, it's much easier than have lots of mocks for every scenario of your test code. Using RSpec hooks is really easy to solve that, but still it's going to be spread all over the place. Well, that can be something bad in some perspectives, but on many others it's not, such as the fact that test scenarios should be focused on a single problem, and by assuming such thing you realize that each scenario has it's own requirements to be tested, which goes against factories.

    All in all, it's just a matter of use case. I think factories are perfect as a support to write acceptance tests, but I also think that when making unit tests, it's just WAY better to use something as RSpec stub_model than a factory, for example.

    For instance, stub_model won't complain about validations, so let's say I have 2 models associated, through a has_many association, with the one I want to test. Let's say I have validations requiring both. When using FactoryGirl, it would complain if the validations are not fulfilled. Maybe you could force to skip those, but that doesn't sound a good solution. One would say that you could just create a base factory, and create all the associations in your tests - but that wouldn't be good either, because if it has a validation it means it must be present, so every other test would require that too...

    This is just an example, but in many cases factories are not a good solution. They make it faster to test, which also means you think less before doing it, and many times you end up coupling things.

    by Nicholas Pufal on
  22. Good post, it made me think.

    I'm bumping into the FactoryGirl complexity on my small application. I'm observing that testing my controllers requires a working model layer. Due to the presence of model validation, my generation of a working model layer is getting hard. I've got several layers of association with a many-to-many tossed in there. It just feels like I'm building a small mock in order to get basic tests to work.

    Based on the comments above, it would seem like RSpec stub_model provides a way to avoid populating a complex model layer. But that didn't work as the stub_models aren't part of the overall collection. That meant that things like ClassModel.find() don't work.

    To Nicholas's point above, when you've got associations and/or indexed data, you are likely to need an accurate data set. This is really the heart of most controller tests.

    by Larry Dennison on

You need to login with GitHub in order to comment.