The hardware store
Recently Zach Tellman and Factual open sourced several libraries that they wrote to handle specific needs where nothing else existed. In the comments on Reddit some people were griping about the possiblity that this software might be abandoned in a year or two, and if they depended on it then they would be stuck. I think this mindset comes from a misguided and selfish perception of open source.
As a new software engineer, it can be attractive to treat open source libraries, applications, and frameworks like a hardware store. If you have a problem and the standard library doesn’t address it, then pull in a dependency. Need a util function? Do a search on GitHub, and add in a dependency. Want to take advantage of this new fangled single page app craze? Pull in another one. Need to process XML with Ruby? Just install Nokogiri in a minute and you’re away laughing.
This may work for a while, but it misses something vital about software engineering: software decays over time, a.k.a. entropy. Software doesn’t exist in an isolated system, it interacts with other systems that change over time. These are things like your operating system, memory, other external services, databases, CPU’s, network, I/O devices (printers, displays), and most importantly users. These systems are replaced or updated with newer systems. Sometimes the changes are backwards compatible, sometimes they aren’t. As a consequence code is written once but maintained forever. Using someone else’s open source code can be a great help for you, because you don’t have to write it. However as things change over time, entropy will kick in and the code will need to be maintained and updated.
It takes a village to raise a library
Instead of treating open source software as a hardware store, I think a better metaphor is that of joining a village raising a child. Every dependency that you pull in needs to be maintained over time, along with it’s dependencies too, all the way down. The question isn’t about whether the maintenance will need to be done, it’s who is going to do it? Larger communities have more resources and time to do this. Older projects have been refined and battle tested and have more stable API’s. If you’re working in a young, obscure, or fast moving language then more of the maintenance may fall on you.
I think of someone releasing open source software as a gift to the world, not as claiming a responsibility to maintain it for you. Some projects do claim that responsibility, but it’s not automatically conferred just because someone released a project on GitHub. I think much more of the responsibility falls on the person using it. It’s your code that will be using it, your code that will need to be upgraded, and your code that will break. These are a few things that you should ask before you start using a library or framework.
- What does this software depend on? What are it’s dependencies dependencies? Are they reasonably up to date? Are there any libraries here doing weird things with classloaders, bytecode, or messing with the runtime? These are much more likely to break in new versions of your language or runtime.
- How much benefit do I get from using this instead of another library or framework, or just writing it myself?
- Is this library well written? Are there comprehensive tests for the code? Do they pass?
- Does the author recommend this for use in production, or is it just a proof of concept or exploratory idea?
- Does the author have a history of maintaining open source software? Are they using it themselves? If I want to add a feature or fix a bug, is the author likely to accept it, or is it ‘open source with no pull requests’? This is fine by the way, it just means that you may need to maintain your own fork if your needs diverge.
- If this is a database driver, is it promptly updated for new versions of the database? For example Netflix’s Cassandra driver Astynax is lagging behind the latest version of Cassandra.
- What is my and my employer’s risk tolerance?
- Do I have the time, permission from my employer, and capability to maintain or improve this library myself?
- If relevant, has the library been through a security audit?
- What does the author say about API stability?
- What does the issue tracker for a project look like? Is the author responsive or are they not involved anymore?
- Is the license compatible with the rest of your software?
- If this is supported or released by a commercial company, are they likely to change the license or save important features for enterprise customers in the future?
- Is there a common API that multiple libraries implement so I can swap between them? In Java these are things like JPA and XQJ that can reduce lock-in to one library.
- When was the last nontrivial commit? How old is the whole project?
- Is there a community of users around this? Is there a mailing list?
- What is the lifespan and criticality of the code I’m writing?
Once you’ve answered these questions, you’ll have a much better understanding of the risks you inherit from using the library and the likely future direction of the project. If you do decide to adopt it then I recommend joining the mailing list, watching it on GitHub, or otherwise staying up to date with changes.
Pulling in a dependency should be a considered approach, and there are several other options to look at first:
- If you only need a small amount of relatively simple code and the license allows it, then just copy that code into your project.
- Make sure that the standard library doesn’t offer something comparable. If it’s just a wrapper library around another dependency, can you use that one directly?
- If you’re pulling in another dependency for a data structure, is there an alternative algorithm you could use which doesn’t require this data structure?
- Is this something that you should write yourself? While this isn’t always the best option, sometimes there’s nothing around that meets your quality standards and you will need to build it yourself.
- Is there a commerical option? While open source is free as in beer, it’s also free as in baby. Paying someone else to maintain software adds incentive for them to continue maintaining it for you and may be a preferable option for many businesses.
Open source software is a massive boon to programmer productivity, saving man centuries of effort. But remember, just as you own your availability, you also own your software and everything that goes into it.
I didn’t get to this point on my own, I owe a big thanks to Colin Taylor, Derek Troy-West, and Mark Derricutt for their advice on this, and rejecting my code reviews when I pulled in unnecessary dependencies!
Update: This has been translated into Chinese at www.labazhou.net/2014/11/while-open-source-is-free-as-in-beer-it-is-also-free-as-in-baby/