The Quest for Better Tests

Over the past twenty years, I’ve written my fair share of unit tests, mostly just covering the happy path and sending in some bogus inputs to test the edges. Typically following a fat-model-thin-controller method (often recommended by me), I failed to understand the point of integration tests. I tried TDD at the beginning of several greenfield projects, but I was never successful in making it sustainable. Similarly, with Selenium, it worked at first but quickly proved to be too brittle to keep up with rapidly changing UIs. (In retrospect, bad CSS architecture on those projects probably deserved the blame more than Selenium per se.)

Despite my somewhat lackluster attitude toward testing, my employers and customers knew me as a big advocate for test automation—who always insisted that we never release anything to QA without at least a layer of unit tests. Oftentimes I was overruled by more senior leadership. As expected, from time-to-time, we all got burned by bugs that would have been easily caught by more comprehensive tests. We swore we’d write better test next time. But for a million reasons—”speed to market” being the peskiest of them—testing never became a consistent priority.

In February of 2016, I joined Lab Zero. The very first observation I made after starting on a project—a financial services application three years in the making—was the sheer volume of test code. Nearly everywhere I looked, I found at least a 10:1 ratio of lines of test code to lines of “real” code. Shortly after starting on my first story, it became readily apparent that at least a 10:1 ratio of developer effort was required to continue this pattern. We joked about developers who reported their status during daily standup by saying, “I’m done with the code and just need to write some tests,” because we knew that was a euphemism for being less than 10% done with the story!

It didn’t take long before realizing how much catching up I needed to do. In fact, the project leader told me it would take me “a year” to learn how to test properly. After first thinking that he sounded condescending, I came to realize that he was just being realistic. Testing is hard; testing effectively is even harder.

Ten months into my Test Quest, here are some important lessons I’ve picked up about automated testing.

Note: I used Ruby, Rspec and Cucumber to create my code samples, but the lessons learned will likely apply to other ecosystems.

The myth of 100% code coverage

Sure code coverage an important metric, but one that only tells part of the story. Test coverage is not the same as good test coverage. It’s remarkably easy to write tests that test nothing at all, that test the wrong things or that test the right things—but in ways that never fail.

Consider the following example, wherein the remove_employee method has a glaring error, one that will easily be caught by a unit test. Or will it?
class Company
  def initialize
    @employees ||= Set.new
  end
  def add_employee(person)
    @employees << person
    @employees.size
  end
  def remove_employee(person)
    @employees.size - 1 #danger: incorrect implementation!
  end
end
RSpec.describe Company, :type => :model do
  let(:subject) { Company.new }
  describe 'managing employees' do
    let(:person) { double(‘person’) } 
    it ‘removes an `employee`’ do
      employee_count = subject.add_employee(person)
      expect(subject.remove_employee(person)).to eq(employee_count - 1)
    end
end

Because the test for removing employees naively compares only the outputs of the add and remove methods, it passes with flying colors even though the remove_employees method internals are totally wrong.

And this why it’s a good idea to…

Test internals instead of just inputs and outputs

In most—if not all—programming languages, there are many more ways to produce “outputs” than just the return values of method calls.

C/C++ developers can optionally pass primitives to functions by reference (e.g. int &param1), morphing those inputs into potential outputs. More modern languages restrict everything to pass-by-value, but most of the time what’s being passed “by value” is actually a reference to an instance of an object. As a result, it’s possible—and quite commonplace—to mutate the object instance itself in the context of a method, providing another sneaky way for methods to have unexpected “outputs.”

Unfortunately, testing internals can be challenging, but it doesn’t have to be.

Design and write testable code

A previous version of me believed that only a very limited set of circumstances should trump writing elegant code. I recently relaxed this constraint, adopting the belief that it’s okay to over-decompose code (and make other code design compromises) in order to serve the goal of writing code that’s more testable.

For example, I might replace a simple, elegant call to a setter with a method that wraps it, e.g.:

shape.color == :blue

vs.

def is_blue?(shape)
  shape.color == :blue
end

In the past, code like this would make my eyes bleed. However, it’s really easy now to stub out is_blue? so that it returns a mock object or performs some other test-only behavior.

This is a contrived example, but if figuring out if a shape is blue required a database read or a call to an underlying service object, then over-decomposition like this is small price to pay to make the code testable.

Test incrementally

I’ve found TDD (specifically a test-first methodology) to be overly prescriptive, usually leading to diminishing returns as the project gets more complex. If it helps clarify the specs and define edges more easily, then by all means, write tests first! However, I’ve found more productivity (and less head-scratching) comes from writing tests not necessarily first, but in short iterative bursts.

Every time I finish an “idea” in code (for lack of a better term), I switch over and edit the test, usually already open in a split-screen view next to the code. If the “idea” is too complex, I take a step back and flesh out more tests to help me clarify what I’m trying to accomplish in the code.

In the past I’ve also worked in a pairing setup where I wrote the code and switched back-and-forth with another developer writing tests. Though I haven’t done this recently, it’s another technique that’s worked well for me.

DRY code, wet lets

Don’t Repeat Yourself (DRY) is a great rule-of-thumb for writing code, but it can be disastrous  when memoizing test data, e.g. through calls to rspec’s let or let!

With the exception of some truly global concepts (e.g. user_id), all test data should be initialized in close proximity to (read: immediately before) the tests that use it and should not be reused between unrelated tests.

Thinking I was helping, I tried to DRY-up some lets, soonafter realizing that I had no idea what test data was getting passed to what tests. Even it feels cumbersome to repeatedly initialize the same data over and over before each test, it’s the right thing to do.

Re-use Cucumbers with Scenario Outlines

Unlike lets, some parts of the test ecosystem are actually designed for reuse. One example: Scenario Outlines. I recommend using these whenever possible.

With Cucumber, Scenario Outlines represent the “functions” in an otherwise functionless DSL. In addition to the obvious reduction in code bulk, thinking about how I can turn several tests into one test “template” helps me write more thoughtful, self-documenting tests.

Vary only what needs to be varied

It’s tempting to cut corners (and make tests run more efficiently) by favoring randomizing test data over creating different tests for different values. Often this practice is harmless, especially if the specific values—as long as they’re in range, e.g. a person’s age—are inconsequential. (If specific values matter, e.g. people 65 and over get medical benefits, they should of course get their own explicit tests.)

Randomizing test data can also be a trap. For example, a test for a get_birth_year method might start to “flicker” or “flap,” meaning that it passes and fails non-deterministically between test runs—all because of the decision to randomize ages.

To protect against this, it helps to treat each test as a controlled experiment, i.e. by keeping the scientific method in mind. Try to control everything that can be controlled and vary only the specific inputs getting tested. Of course, there are things we can’t control, like the system clock, the speed of the network and the availability and behavior of upstream systems. But whenever things can be controlled, control them.

Write meaningful, descriptive test names

Acknowledging the fact that I just recommended thinking like a scientist, I’m now going to suggest putting on a writer hat. When naming tests cases and writing Cucumber steps (which read like prose already), it’s super-important to be descriptive, concise and accurate.

In a place full of smart people like Lab Zero (#humblebrag), developers are not necessarily the only people looking at tests. Recently I had an agile product owner ask me how a certain feature handled different types of inputs. To answer the question, I walked him through my rspecs, reading each test name aloud and describing the expectations.

Writing coaches always say “show, don’t tell.” There is simply no better way to show—and prove—that a feature works than reading through the tests, which serve as the closest link between the specs and the code.

Putting the Science in “Computer Science”

One of my professors in college said that any discipline that has the word “science” in it is actually not a science. This is especially true for computer science, something that at some schools classify as a fine art (making it possible get a BA in CS). Writing code is a certainly a form of communication, at least to peers and future developers. Of course, they are not the customers. And the best way to “communicate” with customers is to provide something for them that works as designed.

How do we ensure that? With well-written tests.

Tests really put the science in computer science. Think of them as a series of carefully controlled experiments. The hypothesis is that the code implements the spec.

Without tests, there’s really no way to know if it does or not.

* * *

Originally published on Lab Zero’s blog.


Also published on Medium.