What you can generate and how

The general philosophy of Hypothesis data generation is that everything should be possible to generate and most things should be easy. Most things in the standard library is more aspirational than achieved, the state of the art is already pretty good.

This document is a guide to what strategies are available for generating data and how to build them. Strategies have a variety of other important internal features, such as how they simplify, but the data they can generate is the only public part of their API.

Functions for building strategies are all available in the hypothesis.strategies module. The salient functions from it are as follows:

hypothesis.strategies.just(value)[source]

Return a strategy which only generates value.

Note: value is not copied. Be wary of using mutable values.

hypothesis.strategies.none()[source]

Return a strategy which only generates None.

hypothesis.strategies.one_of(arg, *args)[source]

Return a strategy which generates values from any of the argument strategies.

hypothesis.strategies.integers(min_value=None, max_value=None)[source]

Returns a strategy which generates integers (in Python 2 these may be ints or longs).

If min_value is not None then all values will be >= min_value. If max_value is not None then all values will be <= max_value

hypothesis.strategies.booleans()[source]

Returns a strategy which generates instances of bool.

hypothesis.strategies.floats(min_value=None, max_value=None)[source]

Returns a strategy which generates floats. If min_value is not None, all values will be >= min_value. If max_value is not None, all values will be <= max_value.

Where not explicitly ruled out by the bounds, all of infinity, -infinity and NaN are possible values generated by this strategy.

hypothesis.strategies.complex_numbers()[source]

Returns a strategy that generates complex numbers.

hypothesis.strategies.tuples(*args)[source]

Return a strategy which generates a tuple of the same length as args by generating the value at index i from args[i].

e.g. tuples(integers(), integers()) would generate a tuple of length two with both values an integer.

hypothesis.strategies.sampled_from(elements)[source]

Returns a strategy which generates any value present in the iterable elements.

Note that as with just, values will not be copied and thus you should be careful of using mutable data

hypothesis.strategies.lists(elements=None, min_size=None, average_size=None, max_size=None, unique_by=None)[source]

Returns a list containining values drawn from elements length in the interval [min_size, max_size] (no bounds in that direction if these are None). If max_size is 0 then elements may be None and only the empty list will be drawn.

average_size may be used as a size hint to roughly control the size of list but it may not be the actual average of sizes you get, due to a variety of factors.

if unique_by is not None it must be a function returning a hashable type when given a value drawn from elements. The resulting list will satisfy the condition that for i != j, unique_by(result[i]) != unique_by(result[j]).

hypothesis.strategies.sets(elements=None, min_size=None, average_size=None, max_size=None)[source]

This has the same behaviour as lists, but returns sets instead.

Note that Hypothesis cannot tell if values are drawn from elements are hashable until running the test, so you can define a strategy for sets of an unhashable type but it will fail at test time.

hypothesis.strategies.frozensets(elements=None, min_size=None, average_size=None, max_size=None)[source]

This is identical to the sets function but instead returns frozensets.

hypothesis.strategies.fixed_dictionaries(mapping)[source]

Generate a dictionary of the same type as mapping with a fixed set of keys mapping to strategies. mapping must be a dict subclass.

Generated values have all keys present in mapping, with the corresponding values drawn from mapping[key]. If mapping is an instance of OrderedDict the keys will also be in the same order, otherwise the order is arbitrary.

hypothesis.strategies.dictionaries(keys, values, dict_class=<type 'dict'>, min_size=None, average_size=None, max_size=None)[source]

Generates dictionaries of type dict_class with keys drawn from the keys argument and values drawn from the values argument.

The size parameters have the same interpretation as for lists.

hypothesis.strategies.streaming(elements)[source]

Generates an infinite stream of values where each value is drawn from elements.

The result is iterable (the iterator will never terminate) and indexable.

hypothesis.strategies.text(alphabet=None, min_size=None, average_size=None, max_size=None)[source]

Generates values of a unicode text type (unicode on python 2, str on python 3) with values drawn from alphabet, which should be an iterable of length one strings or a strategy generating such. If it is None it will default to generating the full unicode range. If it is an empty collection this will only generate empty strings.

min_size, max_size and average_size have the usual interpretations.

hypothesis.strategies.binary(min_size=None, average_size=None, max_size=None)[source]

Generates the appropriate binary type (str in python 2, bytes in python 3).

min_size, average_size and max_size have the usual interpretations.

hypothesis.strategies.basic(basic=None, generate_parameter=None, generate=None, simplify=None, copy=None)[source]

Provides a facility to write your own strategies with significantly less work.

See documentation for more details.

hypothesis.strategies.fractions()[source]

Generates instances of fractions.Fraction.

hypothesis.strategies.decimals()[source]

Generates instances of decimals.Decimal.

hypothesis.strategies.builds(target, *args, **kwargs)[source]

Generates values by drawing from args and kwargs and passing them to target in the appropriate argument position.

e.g. builds(target, integers(), flag=booleans()) would draw an integer i and a boolean b and call target(i, flag=b).

hypothesis.strategies.recursive(base, extend, max_leaves=100)[source]

base: A strategy to start from extend: A function which takes a strategy and returns a new strategy max_leaves: The maximum number of elements to be drawn from base on a given run.

This returns a strategy S such that S = extend(base | S). That is, values maybe drawn from base, or from any strategy reachable by mixing applications of | and extend.

An example may clarify: recursive(booleans(), lists) would return a strategy that may return arbitrarily nested and mixed lists of booleans. So e.g. False, [True], [False, []], [[[[True]]]], are all valid values to be drawn from that strategy.

hypothesis.strategies.composite(f)[source]

Defines a strategy that is built out of potentially arbitrarily many other strategies.

This is intended to be used as a decorator. See the full documentation for more details about how to use this function.

Infinite streams

Sometimes you need examples of a particular type to keep your test going but you’re not sure how many you’ll need in advance. For this, we have streaming types.

>>> from hypothesis import strategy
>>> from hypothesis.strategies import streaming, integers
>>> x = strategy(streaming(integers())).example()
>>> x
Stream(...)
>>> x[2]
209
>>> x
Stream(32, 132, 209, ...)
>>> x[10]
130
>>> x
Stream(32, 132, 209, 843, -19, 58, 141, -1046, 37, 243, 130, ...)

Think of a Stream as an infinite list where we’ve only evaluated as much as we need to. As per above, you can index into it and the stream will be evaluated up to that index and no further.

You can iterate over it too (warning: iter on a stream given to you by Hypothesis in this way will never terminate):

>>> it = iter(x)
>>> next(it)
32
>>> next(it)
132
>>> next(it)
209
>>> next(it)
843

Slicing will also work, and will give you back Streams. If you set an upper bound then iter on those streams will terminate:

>>> list(x[:5])
[32, 132, 209, 843, -19]
>>> y = x[1::2]
>>> y
Stream(...)
>>> y[0]
132
>>> y[1]
843
>>> y
Stream(132, 843, ...)

You can also apply a function to transform a stream:

>>> t = strategy(streaming(int)).example()
>>> tm = t.map(lambda n: n * 2)
>>> tm[0]
26
>>> t[0]
13
>>> tm
Stream(26, ...)
>>> t
Stream(13, ...)

map creates a new stream where each element of the stream is the function applied to the corresponding element of the original stream. Evaluating the new stream will force evaluating the original stream up to that index.

(Warning: This isn’t the map builtin. In Python 3 the builtin map should do more or less the right thing, but in Python 2 it will never terminate and will just eat up all your memory as it tries to build an infinitely long list)

These are the only operations a Stream supports. There are a few more internal ones, but you shouldn’t rely on them.

Adapting strategies

Often it is the case that a strategy doesn’t produce exactly what you want it to and you need to adapt it. Sometimes you can do this in the test, but this hurts reuse because you then have to repeat the adaption in every test.

Hypothesis gives you ways to build strategies from other strategies given functions for transforming the data.

Mapping

Map is probably the easiest and most useful of these to use. If you have a strategy s and a function f, then an example s.map(f).example() is f(s.example()). i.e. we draw an example from s and then apply f to it.

e.g.:

>>> strategy([int]).map(sorted).example()
[1, 5, 17, 21, 24, 30, 45, 82, 88, 88, 90, 96, 105]

Note that many things that you might use mapping for can also be done with the builds function in hypothesis.strategies.

Filtering

filter lets you reject some examples. s.filter(f).example() is some example of s such that f(s) is truthy.

>>> strategy(int).filter(lambda x: x > 11).example()
1873
>>> strategy(int).filter(lambda x: x > 11).example()
73

It’s important to note that filter isn’t magic and if your condition is too hard to satisfy then this can fail:

>>> strategy(int).filter(lambda x: False).example()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/david/projects/hypothesis/src/hypothesis/searchstrategy/strategies.py", line 175, in example
    'Could not find any valid examples in 20 tries'
hypothesis.errors.NoExamples: Could not find any valid examples in 20 tries

In general you should try to use filter only to avoid corner cases that you don’t want rather than attempting to cut out a large chunk of the search space.

A technique that often works well here is to use map to first transform the data and then use filter to remove things that didn’t work out. So for example if you wanted pairs of integers (x,y) such that x < y you could do the following:

>>> strategy((int, int)).map(
... lambda x: tuple(sorted(x))).filter(lambda x: x[0] != x[1]).example()
(42, 1281698)

Chaining strategies together

Finally there is flatmap. Flatmap draws an example, then turns that example into a strategy, then draws an example from that strategy.

It may not be obvious why you want this at first, but it turns out to be quite useful because it lets you generate different types of data with relationships to eachother.

For example suppose we wanted to generate a list of lists of the same length:

>>> from hypothesis.strategies import integers, lists
>>> from hypothesis import find
>>> rectangle_lists = integers(min_value=0, max_value=10).flatmap(lambda n:
... lists(lists(integers(), min_size=n, max_size=n)))
>>> find(rectangle_lists, lambda x: True)
[]
>>> find(rectangle_lists, lambda x: len(x) >= 10)
[[], [], [], [], [], [], [], [], [], []]
>>> find(rectangle_lists, lambda t: len(t) >= 3 and len(t[0])  >= 3)
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> find(rectangle_lists, lambda t: sum(len(s) for s in t) >= 10)
[[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]]

In this example we first choose a length for our tuples, then we build a strategy which generates lists containing lists precisely of that length. The finds show what simple examples for this look like.

Most of the time you probably don’t want flatmap, but unlike filter and map which are just conveniences for things you could just do in your tests, flatmap allows genuinely new data generation that you wouldn’t otherwise be able to easily do.

(If you know Haskell: Yes, this is more or less a monadic bind. If you don’t know Haskell, ignore everything in these parentheses. You do not need to understand anything about monads to use this, or anything else in Hypothesis).

Recursive data

Sometimes the data you want to generate has a recursive definition. e.g. if you wanted to generate JSON data, valid JSON is:

  1. Any float, any boolean, any unicode string.
  2. Any list of valid JSON data
  3. Any dictionary mapping unicode strings to valid JSON data.

The problem is that you cannot call a strategy recursively and expect it to not just blow up and eat all your memory.

The way Hypothesis handles this is with the ‘recursive’ function in hypothesis.strategies which you pass in a base case and a function that given a strategy for your data type returns a new strategy for it. So for example:

>>> import hypothesis.strategies as st
>>> json = st.recursive(st.floats() | st.booleans() | st.text() | st.none(),
... lambda children: st.lists(children) | st.dictionaries(st.text(), children))
>>> json.example()
{'': None, '\U000b3407\U000b3407\U000b3407': {
    '': '"é""é\x11', '\x13': 1.6153068016570349e-282,
    '\x00': '\x11\x11\x11"\x11"é"éé\x11""éé"\x11"éé\x11éé\x11é\x11',
  '\x80': 'é\x11\x11\x11\x11\x11\x11', '\x13\x13\x00\x80\x80\x00': 4.643602465868519e-144
  }, '\U000b3407': None}
>>> json.example()
[]
>>> json.example()
'\x06ě\U000d25e4H\U000d25e4\x06ě'

That is, we start with our leaf data and then we augment it by allowing lists and dictionaries of anything we can generate as JSON data.

The size control of this works by limiting the maximum number of values that can be drawn from the base strategy. So for example if we wanted to only generate really small JSON we could do this as:

>>> small_lists = st.recursive(st.booleans(), st.lists, max_leaves=5)
>>> small_lists.example()
False
>>> small_lists.example()
[[False], [], [], [], [], []]
>>> small_lists.example()
False
>>> small_lists.example()
[]

Composite strategies

The @composite decorator lets you combine other strategies in more or less arbitrary ways.

Advance warning: You’re going to end up wanting to use this API for a lot of things, and it’s not that you shouldn’t do that, but it has certain intrinsic limitations which mean that overuse of it can hurt performance and example quality.

If it’s convenient to do so you should use builds instead. Otherwise feel free to use this, and if you end up with bad examples or poor performance then you should look here first as the culprit.

The composite decorator works by giving you a function as the first argument that you can use to draw examples from other strategies. For example, the following gives you a list and an index into it:

@composite
def list_and_index(draw, elements=integers()):
    xs = draw(lists(elements, min_size=1))
    i = draw(integers(min_value=0, max_value=len(xs) - 1))
    return (xs, i)

‘draw(s)’ is a function that should be thought of as returning s.example(), except that the result is reproducible and will minimize correctly. The decorated function has the initial argument removed from the list, but will accept all the others in the expected order. Defaults are preserved.

>>> list_and_index()
list_and_index()
>>> list_and_index().example()
([5585, 4073], 1)

>>> list_and_index(booleans())
list_and_index(elements=booleans())
>>> list_and_index(booleans()).example()
([False, False, True], 1)

Note that the repr will work exactly like it does for all the built-in strategies: It will be a function that you can call to get the strategy in question, with values provided only if they do not match the defaults.

You can use assume inside composite functions:

@composite
def distinct_strings_with_common_characters(draw):
    x = draw(text(), min_size=1)
    y = draw(text(alphabet=x))
    assume(x != y)
    return (x, y)

This works as assume normally would, filtering out any examples for which the passed in argument is falsey.

Defining entirely new strategies

The full SearchStrategy API is only “semi-public”, in that it may (but usually won’t) break between minor versions but won’t break between patch releases.

However Hypothesis exposes a simplified version of the interface that you can use to build pretty good strategies. In general it’s pretty strongly recommended that you don’t use this if you can build your strategy out of existing ones, but it works perfectly well.

Here is an example of using the simplified interface:

from hypothesis.searchstrategy import BasicStrategy


class Bitfields(BasicStrategy):

    """A BasicStrategy for generating 128 bit integers to be treated as if they
    were bitfields."""

    def generate_parameter(self, random):
        # This controls the shape of the data that can be generated by
        # randomly screening off some bits.
        return random.getrandbits(128)

    def generate(self, random, parameter_value):
        # This generates a random value subject to a parameter we have
        # previously generated
        return parameter_value & random.getrandbits(128)

    def simplify(self, random, value):
        # Simplify by settings bits to zero.
        for i in range(128):
            k = 1 << i
            # It's important to test this because otherwise it would create a
            # cycle where value simplifies to value. This would cause
            # Hypothesis to get stuck on that value and not be able to simplify
            # it further.
            if value & k:
                yield value & (~k)

    def copy(self, value):
        # integers are immutable so there's no need to copy them
        return value

Only generate is strictly necessary to implement. copy will default to using deepcopy, generate_parameter will default to returning None, and simplify will default to not simplifying.

The reason why the parameters are important is that they let you “shape” the data so that it works with adaptive assumptions, which work by being more likely to reuse parameter values that don’t cause assumptions to be violated.

Simplify is of course what Hypothesis uses to produce simpler examples. It will greedily apply it to your data to produce the simplest example it possible can. You should avoid having cycles or unbounded paths in the graph, as this will tend to hurt example quality and performance.

Instances of BasicStrategy are not actually strategies and must be converted to them using the basic function from hypothesis.strategies. You can convert either a class or an instance:

>>> basic(Bitfields).example()
70449389301502165026254673882738917538
>>> strategy(Bitfields()).example()
180947746395888412520415493036267606532

You can also skip the class definition if you prefer and just pass functions to basic. e.g.

>>> basic(generate=lambda random, _: random.getrandbits(8)).example()
88

The arguments to basic have the same names as the methods you would define on BasicStrategy.

Caveats:

  • Remember that BasicStrategy is not a subclass of SearchStrategy, only convertible to one.
  • The values produced by BasicStrategy are opaque to Hypothesis in a way that ones it is more intimately familiar with are not, because it’s impossible to safely and sensibly deduplicate arbitrary Python objects. This is mostly fine but it blocks certain heuristics and optimisations Hypothesis uses for improving the simplification process. As such implementations using BasicStrategy might get slightly worse examples than the equivalent native ones.
  • You should not use BasicData for anything which you need control over the life cycle of, e.g. ORM objects. Hypothesis will keep instances of these values around for a potentially arbitrarily long time and will not do any clean up for disposing of them other than letting them be GCed as normal.

However if it’s genuinely the best way for you to do it, you should feel free to use BasicStrategy. These caveats should be read in the light of the fact that the full Hypothesis SearchStrategy interface is really very powerful, and the ones using BasicStrategy are merely a bit better than the normal quickcheck interface.

Using the SearchStrategy API directly

If you’re really super enthused about this search strategies thing and you want to learn all the gory details of how it works under the hood, you can use the full blown raw SearchStrategy interface to experience the full power of Hypothesis.

This is only semi-public API, meaning that it may break between minor versions but will not break in patch versions, but it should be considered relatively stable and most minor versions won’t break it.

class hypothesis.strategies.SearchStrategy[source]

A SearchStrategy is an object that knows how to explore data of a given type.

Except where noted otherwise, methods on this class are not part of the public API and their behaviour may change significantly between minor version releases. They will generally be stable between patch releases.

With that in mind, here is how SearchStrategy works.

A search strategy is responsible for generating, simplifying and serializing examples for saving.

In order to do this a strategy has three types (where type here is more precise than just the class of the value. For example a tuple of ints should be considered different from a tuple of strings):

  1. The strategy parameter type
  2. The strategy template type
  3. The generated type

Of these, the first two should be considered to be private implementation details of a strategy and the only valid thing to do them is to pass them back to the search strategy. Additionally, templates may be compared for equality and hashed.

Templates must be of quite a restricted type. A template may be any of the following:

  1. Any instance of the types bool, float, int, str (unicode on 2.7)
  2. None
  3. Any tuple or namedtuple of valid template types
  4. Any frozenset of valid template types

This may be relaxed a bit in future, but the requirement that templates are hashable probably won’t be.

This may all seem overly complicated but it’s for a fairly good reason. For more discussion of the motivation see http://hypothesis.readthedocs.org/en/master/internals.html

Given these, data generation happens in three phases:

  1. Draw a parameter value from a random number (defined by draw_parameter)
  2. Given a parameter value and a Random, draw a random template
  3. Reify a template value, deterministically turning it into a value of the desired type.

Data simplification proceeds on template values, taking a template and providing a generator over some examples of similar but simpler templates.

example()[source]

Provide an example of the sort of value that this strategy generates. This is biased to be slightly simpler than is typical for values from this strategy, for clarity purposes.

This method shouldn’t be taken too seriously. It’s here for interactive exploration of the API, not for any sort of real testing.

This method is part of the public API.

map(pack)[source]

Returns a new strategy that generates values by generating a value from this strategy and then calling pack() on the result, giving that.

This method is part of the public API.

flatmap(expand)[source]

Returns a new strategy that generates values by generating a value from this strategy, say x, then generating a value from strategy(expand(x))

This method is part of the public API.

filter(condition)[source]

Returns a new strategy that generates values from this strategy which satisfy the provided condition. Note that if the condition is too hard to satisfy this might result in your tests failing with Unsatisfiable.

This method is part of the public API.

draw_parameter(random)[source]

Produce a random valid parameter for this strategy, using only data from the provided random number generator.

draw_template(random, parameter_value)[source]

Given this Random and this parameter value, produce a random valid template for this strategy.

reify(template)[source]

Given a template value, deterministically convert it into a value of the desired final type.

to_basic(template)[source]

Convert a template value for this strategy into basic data.

Basic data is any of:

  1. A bool, None, an int that fits into 64 bits, or a unicode string
  2. A list of basic data
from_basic(value)[source]

Convert basic data back to a template, raising BadData if the provided data cannot be converted into a valid template for this strategy.

It is not required that from_basic(to_basic(template)) == template. It is however required that to_basic(from_basic(data)) == data (if this does not raise an exception).

template_upper_bound = inf

Provide an upper bound on the number of available templates. The intended interpretation is that template_upper_bound means “if you’ve only found this many templates don’t worry about it”. It is also used internally in a few places for certain optimisations. Generally speaking once this reaches numbers >= 2 ** 32 or so you might as well just return float(‘inf’). Note that there may be more distinct templates than there are representable values, because some templates may not reify and some may lead to the same value.

strictly_simpler(x, y)[source]

Is the left hand argument strictly simpler than the right hand side.

Required properties:

  1. not strictly_simpler(x, x)
  2. not (strictly_simpler(x, y) and strictly_simpler(y, x))
  3. not (strictly_simpler(x, y) and strictly_simpler(y, z) and strictly_simpler(z x))

This is used for hinting in certain cases. The default implementation of it always returns False and this is perfectly acceptable to leave as is.

simplifiers(random, template)[source]

Yield a sequence of functions which each take a Random object and a single template and produce a generator over “simpler” versions of that template.

The only other required invariant that each simplifier must satisfy is it should not be the case that strictly_simpler(x, y) for any y in simplify(random, x). That is, it’s OK if the simplify doesn’t produce a strictly simpler value but it must not produce a strictly more complex one.

General tips for a good simplify function:

  1. The generator shouldn’t yield too many values. A few hundred is fine, but if you’re generating millions of simplifications you may wish to reconsider your life choices and evaluate which ones actually matter to you.
  2. Cycles in simplify are fine, but the simplify graph should be bounded in the sense that there should be no infinite acyclic paths where a1 simplifies to a2 simplifies to ...
  3. Try major simplifications first to see if you get lucky. Yield a minimal element, throw out half of your data, etc. Providing shortcuts in the graph will speed up the simplification process a lot.

The template argument is provided to allow picking simplifiers that are likely to be useful. It should be considered only a hint, and each simplifier must be valid (in the sense of not erroring. It doesn’t have to do anything useful) for all templates valid for this strategy.

By default this just yields the basic_simplify function (which in turn by default does not do anything useful). If you override this function and also override basic_simplify you should make sure to yield it, or it will not be called.

full_simplify(random, template)[source]

A convenience method.

Run each simplifier over this template and yield the results in turn.

The order in which simplifiers are run is lightly randomized from the order in which simplifiers provides them, in order to avoid certain pathological cases.

basic_simplify(random, template)[source]

A convenience method for subclasses that do not have complex simplification requirements to override.

See simplifiers for details.