Intro to Property-Based Testing in Python with Hypothesis

As software developers, we know the importance of testing our code. Automated tests give us confidence that our programs are correct, and help prevent bugs from creeping in as the codebase evolves.

The most common approach to automated testing is example-based tests like unit tests. We write a suite of tests, each of which sets up some specific inputs, calls our code, and makes assertions about the expected outputs or behavior.

For example, say we have a Python function to calculate the area of a triangle:

def triangle_area(base: float, height: float) -> float:
    return base * height / 2

Some typical unit tests for this function might look like:

def test_triangle_area():
    assert triangle_area(4, 3) == 6
    assert triangle_area(1, 9) == 4.5
    assert triangle_area(0, 5) == 0

These tests verify that triangle_area produces the expected result for a few different inputs. But are these tests sufficient to fully verify the correctness of the function?

The limitation of example-based tests is that they rely on the test author to think of interesting test cases. It‘s easy for a human to think of common "happy path" cases like a simple 3-4-5 right triangle. But we might forget to consider edge cases like a triangle with zero height, negative dimensions, very large inputs, etc.

Even if we do think of a variety of edge cases to test, it‘s tedious to manually write them all out. And no matter how many examples we come up with, there will always be more that we didn‘t consider. Our tests can only cover a small sample of the infinite set of possible inputs.

Wouldn‘t it be nice if we could concisely specify the general behavior we expect from our function and then automatically generate test cases to verify that specification? That‘s exactly what property-based testing aims to achieve!

The Idea of Property-Based Testing

In property-based testing, rather than focusing on specific input-output examples, we define general properties that should always hold true for our code no matter what input it receives.

Then we let the testing library randomly generate many different test cases and verify that our defined properties hold for all of them. If the library finds an example input that falsifies one of the properties, the test fails and reports the counter-example.

Some key advantages of this approach are:

  • Tests are more concise and expressive. We just define the properties we expect rather than a long list of examples.
  • Edge cases are more likely to be explored. The library generates test cases we might not have considered.
  • Failing examples are automatically simplified. When a failure is found, the library tries to "shrink" the failing input to a minimal counter-example.

The original and most well-known library for property-based testing is QuickCheck for Haskell. But today property-based testing libraries are available for most major programming languages.

For Python, the premier library for property-based testing is Hypothesis. Let‘s see it in action!

A First Example with Hypothesis

We‘ll start by seeing how Hypothesis can be applied to testing our simple triangle_area function.

First we need to install Hypothesis:

pip install hypothesis

Now let‘s write a property-based test for triangle_area. We‘ll define a property that should hold true for any valid triangle:

from hypothesis import given
from hypothesis import strategies as st

@given(
    base=st.floats(min_value=0, allow_infinity=False), 
    height=st.floats(min_value=0, allow_infinity=False)
)
def test_triangle_area(base, height):
    area = triangle_area(base, height)
    assert 0 <= area <= base * height

Let‘s break this down:

  • The @given decorator tells Hypothesis to generate random examples for the test function parameters.
  • The base and height arguments use Hypothesis‘ strategies to specify what kinds of values should be generated. Here we specify floats greater than or equal to zero, disallowing infinities.
  • The test function calculates the area using the generated base and height.
  • The assertion specifies the property we expect to hold: the area should always be non-negative and no larger than the area of the enclosing rectangle.

When we run this test, Hypothesis will generate 100 random examples of base and height and verify that the property holds for all of them. Here‘s an example output:

Falsifying example: test_triangle_area(base=8.17963556802085, height=6.789934893741703)

Oops, it looks like Hypothesis found an example that violates our property! The calculated area is greater than the enclosing rectangle‘s area.

The problem is that the triangle_area function performs floating-point division, which is not exact due to rounding errors. This can cause the area to exceed the expected upper bound due to precision limitations.

Hypothesis has helped us find a subtle bug in our code that we probably wouldn‘t have thought to write a test case for. By defining a property and generating random examples, the library explored edge cases we overlooked.

We can fix the function by using Decimal instead for more precision:

from decimal import Decimal

def triangle_area(base: float, height: float) -> float:
    return float(Decimal(base) * Decimal(height) / 2)

With this fix, our property now passes for all the examples Hypothesis generates. We can have much more confidence in the correctness of our function.

Testing a Custom Data Structure

Property-based testing really shines when applied to custom data structures and algorithms. Let‘s consider a more complex example.

Say we‘ve implemented a basic binary search tree data structure in Python:

class Node:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None

class BST:
    def __init__(self):
        self.root = None

    def insert(self, val):
        if not self.root:
            self.root = Node(val)
        else:
            self._insert_recursive(val, self.root)

    def _insert_recursive(self, val, node):
        if val < node.val:
            if node.left:
                self._insert_recursive(val, node.left)
            else:
                node.left = Node(val)
        elif val > node.val:
            if node.right:
                self._insert_recursive(val, node.right)
            else:
                node.right = Node(val)

    def search(self, val):
        return self._search_recursive(val, self.root)

    def _search_recursive(self, val, node):
        if not node:
            return False
        if val == node.val:
            return True
        elif val < node.val:
            return self._search_recursive(val, node.left) 
        else:
            return self._search_recursive(val, node.right)

There are a few key properties we expect to hold true for a binary search tree:

  1. The tree maintains the binary search property: for any node, all values in its left subtree are less than its value, and all values in its right subtree are greater than its value.
  2. After inserting a value into the tree, searching for that value should return True.
  3. Searching for a value not in the tree should return False.

Here‘s how we can test those properties with Hypothesis:

from hypothesis import given
from hypothesis import strategies as st

@given(st.lists(st.integers()))
def test_bst_invariants(xs):
    tree = BST()
    for x in xs:
        tree.insert(x)
        assert tree.search(x)

    def validate_tree(node):
        if node is None:
            return True
        if node.left:
            assert node.left.val < node.val
            assert validate_tree(node.left)
        if node.right:
            assert node.right.val > node.val
            assert validate_tree(node.right)
        return True

    assert validate_tree(tree.root)

    assert not tree.search(min(xs) - 1)
    assert not tree.search(max(xs) + 1)

In this test:

  • We use st.lists(st.integers()) to generate random lists of integers. Each list will be used as a sequence of values to insert into a BST.
  • For each generated list, we create a new BST and insert each value in the list into the tree.
  • After each insertion, we assert that searching for the just-inserted value returns True.
  • After constructing the tree, we validate the binary search property by recursively checking each node and its subtrees. An assertion error will be raised if we find any violations.
  • Finally, we check that searching for a value slightly less than the minimum and slightly greater than the maximum both return False as expected.

When we run this test, Hypothesis will generate hundreds of random integer lists and verify that our BST implementation maintains its invariants for all of them.

Excitingly, when I first ran this test on the BST code above, it found a bug!

Falsifying example: test_bst_invariants(xs=[0, 0])

Hypothesis discovered that if we insert duplicate values into the tree, the second insertion is silently ignored and the duplicate is not added. When we later search for the value, it only finds the first occurrence. This violates the property that searching for an inserted value should always return True.

I wouldn‘t have thought to write a test case specifically for duplicate insertions, but Hypothesis generated that edge case on its own and noticed that it violated the spec. By fixing the insert method to properly handle duplicates, we can make the test pass and be confident our implementation is now correct.

Limitations and Challenges

While property-based testing is a powerful tool, it‘s not a silver bullet. There are some important limitations and challenges to be aware of.

One challenge is that it can sometimes be difficult to come up with good properties that fully specify your code‘s behavior. In the case of a arithmetic function like triangle_area it‘s pretty straightforward, but for more complex stateful systems it takes careful thought to define properties that will truly verify correctness.

Another limitation is that property-based testing can be quite slow, especially for complex input types or properties that are expensive to check. Generating hundreds or thousands of test cases and running the property checks on each one takes time. Hypothesis does allow you to control the number of examples generated, but in general property-based tests will be slower than example-based tests.

Finally, while property-based testing can find many edge cases that human testers overlook, it doesn‘t necessarily replace the need for example-based tests entirely. Some bugs only occur for very specific inputs that random generation are unlikely to stumble upon. So it‘s still a good idea to combine property-based tests with some carefully chosen examples covering important cases.

Conclusion

Property-based testing is a powerful approach to verifying the correctness of your code. By focusing on general properties rather than specific examples, it can often find edge cases and subtle bugs that traditional example-based tests miss.

Hypothesis brings this technique to Python, integrating smoothly with standard test frameworks. It lets you write succinct and expressive tests that generate random inputs to exercise your code.

I‘ve found that writing property-based tests forces me to think much more carefully about the expected behavior of my code. It pushes me to consider edge cases more thoroughly and specify invariants more precisely. And I‘m always impressed when Hypothesis finds some tricky bug in my implementations that I would never have discovered on my own.

If you haven‘t tried property-based testing before, I highly recommend giving it a shot! It‘s a valuable tool to add to your testing toolbox. Try applying it to some of your utility functions or data structures and see what bugs it can uncover and what extra confidence it can bring.

Similar Posts