Reproducing Failures

One of the things that is often concerning for people using randomized testing is the question of how to reproduce failing test cases.

Note

It is better to think about the data Hypothesis generates as being arbitrary, rather than random. We deliberately generate any valid data that seems likely to cause errors, so you shouldn’t rely on any expected distribution of or relationships between generated data. You can read about “swarm testing” and “coverage guided fuzzing” if you’re interested, because you don’t need to know for Hypothesis!

Fortunately Hypothesis has a number of features to support reproducing test failures. The one you will use most commonly when developing locally is the example database, which means that you shouldn’t have to think about the problem at all for local use - test failures will just automatically reproduce without you having to do anything.

The example database is perfectly suitable for sharing between machines, but there currently aren’t very good work flows for that, so Hypothesis provides a number of ways to make examples reproducible by adding them to the source code of your tests. This is particularly useful when e.g. you are trying to run an example that has failed on your CI, or otherwise share them between machines.

Providing explicit examples

The simplest way to reproduce a failed test is to ask Hypothesis to run the failing example it printed. For example, if Falsifying example: test(n=1) was printed you can decorate test with @example(n=1).

@example can also be used to ensure a specific example is always executed as a regression test or to cover some edge case - basically combining a Hypothesis test and a traditional parametrized test.

hypothesis.example(*args, **kwargs)[source]

A decorator which ensures a specific example is always tested.

Hypothesis will run all examples you’ve asked for first. If any of them fail it will not go on to look for more examples.

It doesn’t matter whether you put the example decorator before or after given. Any permutation of the decorators in the above will do the same thing.

Note that examples can be positional or keyword based. If they’re positional then they will be filled in from the right when calling, so either of the following styles will work as expected:

@given(text())
@example("Hello world")
@example(x="Some very long string")
def test_some_code(x):
    assert True

from unittest import TestCase

class TestThings(TestCase):
    @given(text())
    @example("Hello world")
    @example(x="Some very long string")
    def test_some_code(self, x):
        assert True

As with @given, it is not permitted for a single example to be a mix of positional and keyword arguments. Either are fine, and you can use one in one example and the other in another example if for some reason you really want to, but a single example must be consistent.

Reproducing a test run with @seed

hypothesis.seed(seed)[source]

seed: Start the test execution from a specific seed.

May be any hashable object. No exact meaning for seed is provided other than that for a fixed seed value Hypothesis will try the same actions (insofar as it can given external sources of non- determinism. e.g. timing and hash randomization).

Overrides the derandomize setting, which is designed to enable deterministic builds rather than reproducing observed failures.

When a test fails unexpectedly, usually due to a health check failure, Hypothesis will print out a seed that led to that failure, if the test is not already running with a fixed seed. You can then recreate that failure using either the @seed decorator or (if you are running pytest) with --hypothesis-seed.

The seed will not be printed if you could simply use @example instead.

Reproducing an example with with @reproduce_failure

Hypothesis has an opaque binary representation that it uses for all examples it generates. This representation is not intended to be stable across versions or with respect to changes in the test, but can be used to to reproduce failures with the @reproduce_example decorator.

hypothesis.reproduce_failure(version, blob)[source]

Run the example that corresponds to this data blob in order to reproduce a failure.

A test with this decorator always runs only one example and always fails. If the provided example does not cause a failure, or is in some way invalid for this test, then this will fail with a DidNotReproduce error.

This decorator is not intended to be a permanent addition to your test suite. It’s simply some code you can add to ease reproduction of a problem in the event that you don’t have access to the test database. Because of this, no compatibility guarantees are made between different versions of Hypothesis - its API may change arbitrarily from version to version.

The intent is that you should never write this decorator by hand, but it is instead provided by Hypothesis. When a test fails with a falsifying example, Hypothesis may print out a suggestion to use @reproduce_failure on the test to recreate the problem as follows:

>>> from hypothesis import settings, given, PrintSettings
>>> import hypothesis.strategies as st
>>> @given(st.floats())
... @settings(print_blob=PrintSettings.ALWAYS)
... def test(f):
...     assert f == f
...
>>> try:
...     test()
... except AssertionError:
...     pass
Falsifying example: test(f=nan)

You can reproduce this example by temporarily adding @reproduce_failure(..., b'AAAA//AAAAAAAAEA') as a decorator on your test case

Adding the suggested decorator to the test should reproduce the failure (as long as everything else is the same - changing the versions of Python or anything else involved, might of course affect the behaviour of the test! Note that changing the version of Hypothesis will result in a different error - each @reproduce_failure invocation is specific to a Hypothesis version).

When to do this is controlled by the print_blob setting, which may be one of the following values:

class hypothesis.PrintSettings[source]

Flags to determine whether or not to print a detailed example blob to use with reproduce_failure() for failing test cases.

NEVER = 0

Never print a blob.

INFER = 1

Make an educated guess as to whether it would be appropriate to print the blob.

The current rules are that this will print if:

  1. The output from Hypothesis appears to be unsuitable for use with example(), and
  2. The output is not too long, and
  3. Verbosity is at least normal.
ALWAYS = 2

Always print a blob on failure.