Ghostwriting tests for you

Writing tests with Hypothesis frees you from the tedium of deciding on and writing out specific inputs to test. Now, the hypothesis.extra.ghostwriter module can write your test functions for you too!

The idea is to provide an easy way to start property-based testing, and a seamless transition to more complex test code - because ghostwritten tests are source code that you could have written for yourself.

So just pick a function you’d like tested, and feed it to one of the functions below. They follow imports, use but do not require type annotations, and generally do their best to write you a useful test. You can also use our command-line interface:

$ hypothesis write --help
Usage: hypothesis write [OPTIONS] FUNC...

  `hypothesis write` writes property-based tests for you!

  Type annotations are helpful but not required for our advanced
  introspection and templating logic.  Try running the examples below to see
  how it works:

      hypothesis write gzip
      hypothesis write numpy.matmul
      hypothesis write pandas.from_dummies
      hypothesis write re.compile --except re.error
      hypothesis write --equivalent ast.literal_eval eval
      hypothesis write --roundtrip json.dumps json.loads
      hypothesis write --style=unittest --idempotent sorted
      hypothesis write --binary-op operator.add

  --roundtrip                 start by testing write/read or encode/decode!
  --equivalent                very useful when optimising or refactoring code
  --errors-equivalent         --equivalent, but also allows consistent errors
  --idempotent                check that f(x) == f(f(x))
  --binary-op                 associativity, commutativity, identity element
  --style [pytest|unittest]   pytest-style function, or unittest-style method?
  -e, --except OBJ_NAME       dotted name of exception(s) to ignore
  --annotate / --no-annotate  force ghostwritten tests to be type-annotated
                              (or not).  By default, match the code to test.
  -h, --help                  Show this message and exit.


Using a light theme? Hypothesis respects NO_COLOR and DJANGO_COLORS=light.


The ghostwriter requires black, but the generated code only requires Hypothesis itself.


Legal questions? While the ghostwriter fragments and logic is under the MPL-2.0 license like the rest of Hypothesis, the output from the ghostwriter is made available under the Creative Commons Zero (CC0) public domain dedication, so you can use it without any restrictions.


Guess which ghostwriters to use, for a module or collection of functions.

As for all ghostwriters, the except_ argument should be an Exception or tuple of exceptions, and style may be either "pytest" to write test functions or "unittest" to write test methods and TestCase.

After finding the public functions attached to any modules, the magic ghostwriter looks for pairs of functions to pass to roundtrip(), then checks for binary_operation() and ufunc() functions, and any others are passed to fuzz().

For example, try hypothesis write gzip on the command line!

hypothesis.extra.ghostwriter.fuzz(func, *, except_=(), style='pytest', annotate=None)[source]

Write source code for a property-based test of func.

The resulting test checks that valid input only leads to expected exceptions. For example:

from re import compile, error

from hypothesis.extra import ghostwriter

ghostwriter.fuzz(compile, except_=error)


# This test code was written by the `hypothesis.extra.ghostwriter` module
# and is provided under the Creative Commons Zero public domain dedication.
import re

from hypothesis import given, reject, strategies as st

# TODO: replace st.nothing() with an appropriate strategy

@given(pattern=st.nothing(), flags=st.just(0))
def test_fuzz_compile(pattern, flags):
        re.compile(pattern=pattern, flags=flags)
    except re.error:

Note that it includes all the required imports. Because the pattern parameter doesn’t have annotations or a default argument, you’ll need to specify a strategy - for example text() or binary(). After that, you have a test!


Write source code for a property-based test of func.

The resulting test checks that if you call func on it’s own output, the result does not change. For example:

from typing import Sequence

from hypothesis.extra import ghostwriter

def timsort(seq: Sequence[int]) -> Sequence[int]:
    return sorted(seq)



# This test code was written by the `hypothesis.extra.ghostwriter` module
# and is provided under the Creative Commons Zero public domain dedication.

from hypothesis import given, strategies as st

@given(seq=st.one_of(st.binary(), st.binary().map(bytearray), st.lists(st.integers())))
def test_idempotent_timsort(seq):
    result = timsort(seq=seq)
    repeat = timsort(seq=result)
    assert result == repeat, (result, repeat)
hypothesis.extra.ghostwriter.roundtrip(*funcs, except_=(), style='pytest', annotate=None)[source]

Write source code for a property-based test of funcs.

The resulting test checks that if you call the first function, pass the result to the second (and so on), the final result is equal to the first input argument.

This is a very powerful property to test, especially when the config options are varied along with the object to round-trip. For example, try ghostwriting a test for json.dumps() - would you have thought of all that?

hypothesis write --roundtrip json.dumps json.loads

Write source code for a property-based test of funcs.

The resulting test checks that calling each of the functions returns an equal value. This can be used as a classic ‘oracle’, such as testing a fast sorting algorithm against the sorted() builtin, or for differential testing where none of the compared functions are fully trusted but any difference indicates a bug (e.g. running a function on different numbers of threads, or simply multiple times).

The functions should have reasonably similar signatures, as only the common parameters will be passed the same arguments - any other parameters will be allowed to vary.

If allow_same_errors is True, then the test will pass if calling each of the functions returns an equal value, or if the first function raises an exception and each of the others raises an exception of the same type. This relaxed mode can be useful for code synthesis projects.


Write property tests for the binary operation func.

While binary operations are not particularly common, they have such nice properties to test that it seems a shame not to demonstrate them with a ghostwriter. For an operator f, test that:

For example:

hypothesis.extra.ghostwriter.ufunc(func, *, except_=(), style='pytest', annotate=None)[source]

Write a property-based test for the array ufunc func.

The resulting test checks that your ufunc or gufunc has the expected broadcasting and dtype casting behaviour. You will probably want to add extra assertions, but as with the other ghostwriters this gives you a great place to start.

hypothesis write numpy.matmul

A note for test-generation researchers

Ghostwritten tests are intended as a starting point for human authorship, to demonstrate best practice, help novices past blank-page paralysis, and save time for experts. They may be ready-to-run, or include placeholders and # TODO: comments to fill in strategies for unknown types. In either case, improving tests for their own code gives users a well-scoped and immediately rewarding context in which to explore property-based testing.

By contrast, most test-generation tools aim to produce ready-to-run test suites… and implicitly assume that the current behavior is the desired behavior. However, the code might contain bugs, and we want our tests to fail if it does! Worse, tools require that the code to be tested is finished and executable, making it impossible to generate tests as part of the development process.

Fraser 2013 found that evolving a high-coverage test suite (e.g. Randoop, EvoSuite, Pynguin) “leads to clear improvements in commonly applied quality metrics such as code coverage [but] no measurable improvement in the number of bugs actually found by developers” and that “generating a set of test cases, even high coverage test cases, does not necessarily improve our ability to test software”. Invariant detection (famously Daikon; in PBT see e.g. Alonso 2022, QuickSpec, Speculate) relies on code execution. Program slicing (e.g. FUDGE, FuzzGen, WINNIE) requires downstream consumers of the code to test.

Ghostwriter inspects the function name, argument names and types, and docstrings. It can be used on buggy or incomplete code, runs in a few seconds, and produces a single semantically-meaningful test per function or group of functions. Rather than detecting regressions, these tests check semantic properties such as encode/decode or save/load round-trips, for commutative, associative, and distributive operations, equivalence between methods, array shapes, and idempotence. Where no property is detected, we simply check for ‘no error on valid input’ and allow the user to supply their own invariants.

Evaluations such as the SBFT24 competition measure performance on a task which the Ghostwriter is not intended to perform. I’d love to see qualitative user studies, such as PBT in Practice for test generation, which could check whether the Ghostwriter is onto something or tilting at windmills. If you’re interested in similar questions, drop me an email!