The Hypothesis example database¶
When Hypothesis finds a bug it stores enough information in its database to reproduce it. This enables you to have a classic testing workflow of find a bug, fix a bug, and be confident that this is actually doing the right thing because Hypothesis will start by retrying the examples that broke things last time.
Limitations¶
The database is best thought of as a cache that you never need to invalidate: Information may be lost when you upgrade a Hypothesis version or change your test, so you shouldn’t rely on it for correctness - if there’s an example you want to ensure occurs each time then there’s a feature for including them in your source code - but it helps the development workflow considerably by making sure that the examples you’ve just found are reproduced.
The database also records examples that exercise less-used parts of your code, so the database may update even when no failing examples were found.
Upgrading Hypothesis and changing your tests¶
The design of the Hypothesis database is such that you can put arbitrary data in the database and not get wrong behaviour. When you upgrade Hypothesis, old data might be invalidated, but this should happen transparently. It can never be the case that e.g. changing the strategy that generates an argument gives you data from the old strategy.
ExampleDatabase implementations¶
Hypothesis’ default database
setting creates a
DirectoryBasedExampleDatabase
in your current working directory,
under .hypothesis/examples
. If this location is unusable, e.g. because you do not have
read or write permissions, Hypothesis will emit a warning and fall back to an
InMemoryExampleDatabase
.
Hypothesis provides the following ExampleDatabase
implementations:
- class hypothesis.database.InMemoryExampleDatabase[source]¶
A non-persistent example database, implemented in terms of a dict of sets.
This can be useful if you call a test function several times in a single session, or for testing other database implementations, but because it does not persist between runs we do not recommend it for general use.
- class hypothesis.database.DirectoryBasedExampleDatabase(path)[source]¶
Use a directory to store Hypothesis examples as files.
Each test corresponds to a directory, and each example to a file within that directory. While the contents are fairly opaque, a
DirectoryBasedExampleDatabase
can be shared by checking the directory into version control, for example with the following.gitignore
:# Ignore files cached by Hypothesis... .hypothesis/* # except for the examples directory !.hypothesis/examples/
Note however that this only makes sense if you also pin to an exact version of Hypothesis, and we would usually recommend implementing a shared database with a network datastore - see
ExampleDatabase
, and theMultiplexedDatabase
helper.
- class hypothesis.database.GitHubArtifactDatabase(
- owner,
- repo,
- artifact_name='hypothesis-example-db',
- cache_timeout=datetime.timedelta(days=1),
- path=None,
A file-based database loaded from a GitHub Actions artifact.
You can use this for sharing example databases between CI runs and developers, allowing the latter to get read-only access to the former. This is particularly useful for continuous fuzzing (i.e. with HypoFuzz), where the CI system can help find new failing examples through fuzzing, and developers can reproduce them locally without any manual effort.
Note
You must provide
GITHUB_TOKEN
as an environment variable. In CI, Github Actions provides this automatically, but it needs to be set manually for local usage. In a developer machine, this would usually be a Personal Access Token. If the repository is private, it’s necessary for the token to haverepo
scope in the case of a classic token, oractions:read
in the case of a fine-grained token.In most cases, this will be used through the
MultiplexedDatabase
, by combining a local directory-based database with this one. For example:local = DirectoryBasedExampleDatabase(".hypothesis/examples") shared = ReadOnlyDatabase(GitHubArtifactDatabase("user", "repo")) settings.register_profile("ci", database=local) settings.register_profile("dev", database=MultiplexedDatabase(local, shared)) # We don't want to use the shared database in CI, only to populate its local one. # which the workflow should then upload as an artifact. settings.load_profile("ci" if os.environ.get("CI") else "dev")
Note
Because this database is read-only, you always need to wrap it with the
ReadOnlyDatabase
.A setup like this can be paired with a GitHub Actions workflow including something like the following:
- name: Download example database uses: dawidd6/action-download-artifact@v2.24.3 with: name: hypothesis-example-db path: .hypothesis/examples if_no_artifact_found: warn workflow_conclusion: completed - name: Run tests run: pytest - name: Upload example database uses: actions/upload-artifact@v3 if: always() with: name: hypothesis-example-db path: .hypothesis/examples
In this workflow, we use dawidd6/action-download-artifact to download the latest artifact given that the official actions/download-artifact does not support downloading artifacts from previous workflow runs.
The database automatically implements a simple file-based cache with a default expiration period of 1 day. You can adjust this through the
cache_timeout
property.For mono-repo support, you can provide a unique
artifact_name
(e.g.hypofuzz-example-db-frontend
).
- class hypothesis.database.ReadOnlyDatabase(db)[source]¶
A wrapper to make the given database read-only.
The implementation passes through
fetch
, and turnssave
,delete
, andmove
into silent no-ops.Note that this disables Hypothesis’ automatic discarding of stale examples. It is designed to allow local machines to access a shared database (e.g. from CI servers), without propagating changes back from a local or in-development branch.
- class hypothesis.database.MultiplexedDatabase(*dbs)[source]¶
A wrapper around multiple databases.
Each
save
,fetch
,move
, ordelete
operation will be run against all of the wrapped databases.fetch
does not yield duplicate values, even if the same value is present in two or more of the wrapped databases.This combines well with a
ReadOnlyDatabase
, as follows:local = DirectoryBasedExampleDatabase("/tmp/hypothesis/examples/") shared = CustomNetworkDatabase() settings.register_profile("ci", database=shared) settings.register_profile( "dev", database=MultiplexedDatabase(local, ReadOnlyDatabase(shared)) ) settings.load_profile("ci" if os.environ.get("CI") else "dev")
So your CI system or fuzzing runs can populate a central shared database; while local runs on development machines can reproduce any failures from CI but will only cache their own failures locally and cannot remove examples from the shared database.
- class hypothesis.extra.redis.RedisExampleDatabase(
- redis,
- *,
- expire_after=datetime.timedelta(days=8),
- key_prefix=b'hypothesis-example:',
Store Hypothesis examples as sets in the given
Redis
datastore.This is particularly useful for shared databases, as per the recipe for a
MultiplexedDatabase
.Note
If a test has not been run for
expire_after
, those examples will be allowed to expire. The default time-to-live persists examples between weekly runs.
Defining your own ExampleDatabase¶
You can define your ExampleDatabase
, for example
to use a shared datastore, with just a few methods:
- class hypothesis.database.ExampleDatabase(*args, **kwargs)[source]¶
An abstract base class for storing examples in Hypothesis’ internal format.
An ExampleDatabase maps each
bytes
key to many distinctbytes
values, like aMapping[bytes, AbstractSet[bytes]]
.- abstract save(key, value)[source]¶
Save
value
underkey
.If this value is already present for this key, silently do nothing.