The Hypothesis example database

When Hypothesis finds a bug it stores enough information in its database to reproduce it. This enables you to have a classic testing workflow of find a bug, fix a bug, and be confident that this is actually doing the right thing because Hypothesis will start by retrying the examples that broke things last time.

Limitations

The database is best thought of as a cache that you never need to invalidate: Information may be lost when you upgrade a Hypothesis version or change your test, so you shouldn’t rely on it for correctness - if there’s an example you want to ensure occurs each time then there’s a feature for including them in your source code - but it helps the development workflow considerably by making sure that the examples you’ve just found are reproduced.

The database also records examples that exercise less-used parts of your code, so the database may update even when no failing examples were found.

Upgrading Hypothesis and changing your tests

The design of the Hypothesis database is such that you can put arbitrary data in the database and not get wrong behaviour. When you upgrade Hypothesis, old data might be invalidated, but this should happen transparently. It can never be the case that e.g. changing the strategy that generates an argument gives you data from the old strategy.

ExampleDatabase implementations

Hypothesis’ default database setting creates a DirectoryBasedExampleDatabase in your current working directory, under .hypothesis/examples. If this location is unusable, e.g. because you do not have read or write permissions, Hypothesis will emit a warning and fall back to an InMemoryExampleDatabase.

Hypothesis provides the following ExampleDatabase implementations:

class hypothesis.database.InMemoryExampleDatabase[source]

A non-persistent example database, implemented in terms of a dict of sets.

This can be useful if you call a test function several times in a single session, or for testing other database implementations, but because it does not persist between runs we do not recommend it for general use.

class hypothesis.database.DirectoryBasedExampleDatabase(path)[source]

Use a directory to store Hypothesis examples as files.

Each test corresponds to a directory, and each example to a file within that directory. While the contents are fairly opaque, a DirectoryBasedExampleDatabase can be shared by checking the directory into version control, for example with the following .gitignore:

# Ignore files cached by Hypothesis...
.hypothesis/*
# except for the examples directory
!.hypothesis/examples/

Note however that this only makes sense if you also pin to an exact version of Hypothesis, and we would usually recommend implementing a shared database with a network datastore - see ExampleDatabase, and the MultiplexedDatabase helper.

class hypothesis.database.GitHubArtifactDatabase(
owner,
repo,
artifact_name='hypothesis-example-db',
cache_timeout=datetime.timedelta(days=1),
path=None,
)[source]

A file-based database loaded from a GitHub Actions artifact.

You can use this for sharing example databases between CI runs and developers, allowing the latter to get read-only access to the former. This is particularly useful for continuous fuzzing (i.e. with HypoFuzz), where the CI system can help find new failing examples through fuzzing, and developers can reproduce them locally without any manual effort.

Note

You must provide GITHUB_TOKEN as an environment variable. In CI, Github Actions provides this automatically, but it needs to be set manually for local usage. In a developer machine, this would usually be a Personal Access Token. If the repository is private, it’s necessary for the token to have repo scope in the case of a classic token, or actions:read in the case of a fine-grained token.

In most cases, this will be used through the MultiplexedDatabase, by combining a local directory-based database with this one. For example:

local = DirectoryBasedExampleDatabase(".hypothesis/examples")
shared = ReadOnlyDatabase(GitHubArtifactDatabase("user", "repo"))

settings.register_profile("ci", database=local)
settings.register_profile("dev", database=MultiplexedDatabase(local, shared))
# We don't want to use the shared database in CI, only to populate its local one.
# which the workflow should then upload as an artifact.
settings.load_profile("ci" if os.environ.get("CI") else "dev")

Note

Because this database is read-only, you always need to wrap it with the ReadOnlyDatabase.

A setup like this can be paired with a GitHub Actions workflow including something like the following:

- name: Download example database
  uses: dawidd6/action-download-artifact@v2.24.3
  with:
    name: hypothesis-example-db
    path: .hypothesis/examples
    if_no_artifact_found: warn
    workflow_conclusion: completed

- name: Run tests
  run: pytest

- name: Upload example database
  uses: actions/upload-artifact@v3
  if: always()
  with:
    name: hypothesis-example-db
    path: .hypothesis/examples

In this workflow, we use dawidd6/action-download-artifact to download the latest artifact given that the official actions/download-artifact does not support downloading artifacts from previous workflow runs.

The database automatically implements a simple file-based cache with a default expiration period of 1 day. You can adjust this through the cache_timeout property.

For mono-repo support, you can provide a unique artifact_name (e.g. hypofuzz-example-db-frontend).

class hypothesis.database.ReadOnlyDatabase(db)[source]

A wrapper to make the given database read-only.

The implementation passes through fetch, and turns save, delete, and move into silent no-ops.

Note that this disables Hypothesis’ automatic discarding of stale examples. It is designed to allow local machines to access a shared database (e.g. from CI servers), without propagating changes back from a local or in-development branch.

class hypothesis.database.MultiplexedDatabase(*dbs)[source]

A wrapper around multiple databases.

Each save, fetch, move, or delete operation will be run against all of the wrapped databases. fetch does not yield duplicate values, even if the same value is present in two or more of the wrapped databases.

This combines well with a ReadOnlyDatabase, as follows:

local = DirectoryBasedExampleDatabase("/tmp/hypothesis/examples/")
shared = CustomNetworkDatabase()

settings.register_profile("ci", database=shared)
settings.register_profile(
    "dev", database=MultiplexedDatabase(local, ReadOnlyDatabase(shared))
)
settings.load_profile("ci" if os.environ.get("CI") else "dev")

So your CI system or fuzzing runs can populate a central shared database; while local runs on development machines can reproduce any failures from CI but will only cache their own failures locally and cannot remove examples from the shared database.

class hypothesis.extra.redis.RedisExampleDatabase(
redis,
*,
expire_after=datetime.timedelta(days=8),
key_prefix=b'hypothesis-example:',
)[source]

Store Hypothesis examples as sets in the given Redis datastore.

This is particularly useful for shared databases, as per the recipe for a MultiplexedDatabase.

Note

If a test has not been run for expire_after, those examples will be allowed to expire. The default time-to-live persists examples between weekly runs.

Defining your own ExampleDatabase

You can define your ExampleDatabase, for example to use a shared datastore, with just a few methods:

class hypothesis.database.ExampleDatabase(*args, **kwargs)[source]

An abstract base class for storing examples in Hypothesis’ internal format.

An ExampleDatabase maps each bytes key to many distinct bytes values, like a Mapping[bytes, AbstractSet[bytes]].

abstract save(key, value)[source]

Save value under key.

If this value is already present for this key, silently do nothing.

abstract fetch(key)[source]

Return an iterable over all values matching this key.

abstract delete(key, value)[source]

Remove this value from this key.

If this value is not present, silently do nothing.

move(src, dest, value)[source]

Move value from key src to key dest. Equivalent to delete(src, value) followed by save(src, value), but may have a more efficient implementation.

Note that value will be inserted at dest regardless of whether it is currently present at src.