Any of those answers can also be a crutch, because while they’re potentially useful they’re not alone sufficient.
And they’re not free.
If those 10,000 tests never catch a bug before a customer sees it then they weren’t actually providing any value.
They were costing development time to write them and run them, even if that run time cost is paid by build machines it still takes that much longer to get the code off the developer’s machine and on its way to production.
Those tests are also friction for making changes, and they have to be maintained just like any customer facing code.
Most of the things in that list directly affect a developer’s quality of life.
They’re going to be working in other people’s changes and debugging and trying to shape it to do new things while still doing old things, and for the most part the nameless developer in this scenario wants to quickly figure out the right place to do something, get it done, move past it, and not have to revisit it later because it turns out that there were some unintended consequences to their work that got discovered in production.
That’s good enough motivation and that can be a fine list for those goals.
If people are dealing with an inconsistent and unpredictable code base then it’s going to lead to frustration and ultimately attrition (one of the non-obvious ways bad code has an impact).
Get enough attrition and it’s going to directly affect the business metric.
So with all that as preamble, here are some of my controversial opinions about code quality that are either more nuanced than modern conventional wisdom, or run completely counter to it.
They’re all formed based on experience rather than ideologies.
Tests have a very specific and limited usefulness.
Generally it’s better to err on the side of fewer testsUnit tests have costs: time to write, time to run, time to maintain.
Unit tests functionally really only do two things: they exercise code, and when coupled with code coverage they can show how to hit error cases in the code.
They don’t verify correctness; they make sure the code behaves the way it was written.
There will never be unit tests that catch something that wasn’t thought of, and code coverage in these tests is super misleading because of the potential explosion of combinatoric states.
Add in mocked dependencies that keep track of how many times a function is called and you just turned your implementation details into part of your interface contract.
Outside of catching egregious errors, unit tests are super valuable for two things:They can provide documentation for how an API is intended to be used.
They’re a line of defense for good API design.
If it is hard to write the test demonstrating how to use the API then the API can be improved.
Treat tests as documentation for a developer, not as a quality gate for preventing bugs.
Tests that run through programmatic variations of all parameters serve exactly the opposite of this purpose.
Avoid dependency injection as a pattern.
Mocking is an anti-pattern, except in really specific scenariosDI is a way of making things more testable without thinking about what or why you’re trying to test it.
It leads away from principles of loose coupling and tight cohesion, even in the name of promoting it.
For everything that can be injected that becomes a concrete part of the contract for the code that it will be implemented in terms of those services.
Libraries that make mocking objects easy reinforce that behavior.
Rather than using parameters to drive configurable behavior it becomes easy to write tests that basically treat implementation details as configuration objects.
It becomes very difficult to reason about the true production state of those dependencies and how they’re being provided and interacting with each other.
The three scenarios where it makes sense to have mocking are storage, network, and time services.
Unless you’re working on an actual storage, network, or time library, those are resources that are entirely outside the domain of most code’s business logic and in as much as there are tests, mocking them gives the ability to have more reliable and repeatable success and failure verification.
Behavioral metrics are superlativeIt is super important to understand how things are running in production.
Whereas tests can only tell you answers to questions someone thought to ask, metrics can yield answers that beg new questions.
Build dashboards based on important behavioral metrics.
Build themed dashboards for different aspects of different features.
Operational run books should have links to these dashboards.
Whenever someone writes a new feature there should be metrics in place, and a sense of what those metrics will look like if things are going right and also if they’re going wrong.
Focus on behavioral metrics.
Latency and counters are easy things to record and match a developer’s mind set in the code.
Avoid falling back on those being the core metrics emitted.
This is the data-science version of “the tests verify that the code runs the way I wrote it.
” Emit metrics based on external, ideally the way the customer experiences it, observations.
Don’t record something just because it was already conveniently expressed as a number.
Instrument competing indicators.
In as much as metrics can be used to measure success of a feature they can also be gamed.
This won’t usually be malicious, but it’s the nature of measuring anything.
If you’re trying to reduce the latency on an API that returns a list of the top 20 items for X a cheap way to reduce it is to change that 20 to 10.
Is that the right/wrong thing to do?.I don’t know.
It shouldn’t be done arbitrarily and there really had better be something else being measured that indicates the holistic impact of any trade off being made.
Tie alarms and deployment to metrics monitoring.
If a code change causes metrics to move in an unexpected way the deployment should be automatically stopped.
A key part of this is that it is automated.
Don’t expect to be watching every dashboard with thorough diligence for every deployment.
There should be so many metrics and so many alarms for any application of reasonable scale that it’s infeasible to manually watch them all.
This mostly applies to server software, but more and more everything is effectively moving this way.
This is equally applicable to client-side telemetry.
Now, tying this back into the original thesis: Do these suggestions help improve the macro business metric?Maybe.
In my experience, I think so.
They can certainly be abused by following the letter and not the spirit.
I’d hate for someone to read this and use it as an argument for not writing tests and then not doing anything else to replace those tests they didn’t write.
There are additional things I have opinions on about actual code level constructs and architecture, but since those aren’t really about keeping the business alive they’re not pertinent to this write up.
These opinions are my own and I reserve the right to get smarter, later.
Let me know if any of this rings true or especially if I seem to have completely missed the mark on something.
Photo by Stephen Walker on Unsplash.