Matt Diephouse

Property Tests for Legacy Code

26 February 2020

In Working Effectively with Legacy Code, Michael Feathers taught us that “legacy code is simply code without tests” and gave us techniques to introduce tests. An essential part of that process is writing characterization tests that capture the current behavior of the code. Without tests for the current behavior, you can’t know whether you’ve changed it. That makes change risky.

But how do you know what to test? Feathers provided a few guidelines:

  1. Write a test you know will fail. Use the failed assertion to write a passing test.
  2. Look at the code you’re trying to test.
  3. Test the specific area you plan to change.

That’s good advice, but it doesn’t guarantee good test coverage. Legacy code is often poorly written and poorly understood, so it’s easy to miss important test cases.

This is where property testing can help.

Property tests, as popularized by Haskell’s QuickCheck library, generate random inputs to verify some property of your code.

For example, you might test that:

  • Merging 2 dictionaries creates a dictionary with keys from both
  • Serializing and then deserializing some data results in the original data
  • Any data that you encode to JSON is valid according to a schema

QuickCheck will generate random dictionaries, data, etc. and verify that the property is true for all of them. These are weaker assertions than you can make in normal unit tests, but in exchange you can run them over a large number of generated examples. This can help you find cases that you might otherwise miss.

Haskell isn’t the only language with a library for property testing. Swift has SwiftCheck, elm’s testing library has fuzz, python has Hypothesis, and Wikipedia has links to many more.

Property testing can help when refactoring or rewriting legacy code by testing an essential property: the new implementation should behave the same as the old.

The steps are straightforward:

  1. Duplicate the code or interface you want to refactor or rewrite
  2. Write a property test that compares their behavior
  3. Refactor or rewrite the duplicated code or interface
  4. When the property test fails, add a unit test for that behavior
  5. Continue fixing issues until the property test passes
  6. Delete the property test and the old implementation

Now you have a set of unit tests that cover the essential behaviors and an improved implementation. This is Test Driven Development, guided by an existing implementation and a property test. It’s also very similar to the approach taken by something like Ruby’s Scientist library, which lets you compare old and new implementations in production.

Here’s an example of how this can look, written in Swift:

enum CardType: Equatable {
    case amex, masterCard, visa

    // Determine the card type from the card number.
    // (The original implementation, its implementation omitted.)
    init?(number: String) {  }

    // The new or modified implementation.
    //  - To rewrite, return the simplest thing possible
    //  - To modify, copy the existing implementation
    init?(numberNew _: String) {
        return nil
    }
}

// The property test used while modifying the code
func testNewImplementationBehavesTheSame() {
    // Generate random inputs. It’s important to create good values here.
    let inputs = Gen<Character>
        .frequency([
            (18, .fromElements(in: "0"..."9")),    // Mostly use digits.
            ( 1, .pure("a")),                      // Sometimes use an "a"
            ( 1, .pure("*")),                      // or a "*".
        ])
        .proliferate                               // Create an array
        .suchThat { (10..<20).contains($0.count) } // of 10..<20 characters
        .map(String.init(_:))                      // and make a String.

    // By default, SwiftCheck runs until 100 tests pass. Since this test is
    // temporary, bump that number way up: we don’t care if it’s slow.
    let args = CheckerArguments(maxAllowableSuccessfulTests: 100_000)
    property("New Implementation Behaves The Same", arguments: args)
        <- forAll(inputs) { (string: String) in
            return CardType(number: string) == CardType(numberNew: string)
        }
}

// The unit tests we added when the property test failed
func testInitWithCardNumber() {
    XCTAssertEqual(CardType(number: "4111111010110"), .visa)
    XCTAssertEqual(CardType(number: "411111101011a"), nil)
}

Using property tests for this is easiest if the code is free of side effects (yet another reason to prefer pure functions). But this approach is very flexible; you can use it:

  • To replace or modify a single function or method
  • To replace an entire class
  • As an integration test for a larger section of code
  • As a driver to a command-line interface to compare implementations in two different languages

But be aware that property tests are only as good as (1) the properties you write and (2) the random inputs that you generate. You need to be careful about both to generate good coverage

It may also be possible to enumerate all possible inputs. In the example above, I could have tested all strings of 10-20 characters instead of relying on SwiftCheck to generate random inputs. That would have created an even stronger test in this case. But usually there are too many inputs to enumerate.

Property tests are a powerful, underused tool. They can be a huge help for legacy code. And once you start using them, I’m sure you’ll find other useful applications as well.