Hypothesis for the Scientific Stack¶
numpy¶
Hypothesis offers a number of strategies for NumPy testing,
available in the hypothesis[numpy]
extra.
It lives in the hypothesis.extra.numpy
package.
The centerpiece is the arrays()
strategy, which generates arrays with
any dtype, shape, and contents you can specify or give a strategy for.
To make this as useful as possible, strategies are provided to generate array
shapes and generate all kinds of fixedsize or compound dtypes.

hypothesis.extra.numpy.
from_dtype
(dtype)[source]¶ Creates a strategy which can generate any value of the given dtype.

hypothesis.extra.numpy.
arrays
(dtype, shape, elements=None, fill=None, unique=False)[source]¶ Returns a strategy for generating
numpy.ndarray
s.dtype
may be any valid input tonumpy.dtype
(this includesdtype
objects), or a strategy that generates such values.shape
may be an integer >= 0, a tuple of length >= 0 of such integers, or a strategy that generates such values.elements
is a strategy for generating values to put in the array. If it is None a suitable value will be inferred based on the dtype, which may give any legal value (including egNaN
for floats). If you have more specific requirements, you should supply your own elements strategy.fill
is a strategy that may be used to generate a single background value for the array. If None, a suitable default will be inferred based on the other arguments. If set tonothing()
then filling behaviour will be disabled entirely and every element will be generated independently.unique
specifies if the elements of the array should all be distinct from one another. Note that in this case multiple NaN values may still be allowed. If fill is also set, the only valid values for it to return are NaN values (anything for whichnumpy.isnan
returns True. So e.g. for complex numbers (nan+1j) is also a valid fill). Note that if unique is set to True the generated values must be hashable.
Arrays of specified
dtype
andshape
are generated for example like this:>>> import numpy as np >>> arrays(np.int8, (2, 3)).example() array([[8, 6, 3], [6, 4, 6]], dtype=int8)
>>> import numpy as np >>> from hypothesis.strategies import floats >>> arrays(np.float, 3, elements=floats(0, 1)).example() array([ 0.88974794, 0.77387938, 0.1977879 ])
Array values are generated in two parts:
 Some subset of the coordinates of the array are populated with a value drawn from the elements strategy (or its inferred form).
 If any coordinates were not assigned in the previous step, a single value is drawn from the fill strategy and is assigned to all remaining places.
You can set fill to
nothing()
if you want to disable this behaviour and draw a value for every element.If fill is set to None then it will attempt to infer the correct behaviour automatically: If unique is True, no filling will occur by default. Otherwise, if it looks safe to reuse the values of elements across multiple coordinates (this will be the case for any inferred strategy, and for most of the builtins, but is not the case for mutable values or strategies built with flatmap, map, composite, etc) then it will use the elements strategy as the fill, else it will default to having no fill.
Having a fill helps Hypothesis craft high quality examples, but its main importance is when the array generated is large: Hypothesis is primarily designed around testing small examples. If you have arrays with hundreds or more elements, having a fill value is essential if you want your tests to run in reasonable time.

hypothesis.extra.numpy.
array_shapes
(min_dims=1, max_dims=3, min_side=1, max_side=10)[source]¶ Return a strategy for array shapes (tuples of int >= 1).

hypothesis.extra.numpy.
scalar_dtypes
()[source]¶ Return a strategy that can return any nonflexible scalar dtype.

hypothesis.extra.numpy.
unsigned_integer_dtypes
(endianness='?', sizes=(8, 16, 32, 64))[source]¶ Return a strategy for unsigned integer dtypes.
endianness may be
<
for littleendian,>
for bigendian,=
for native byte order, or?
to allow either byte order. This argument only applies to dtypes of more than one byte.sizes must be a collection of integer sizes in bits. The default (8, 16, 32, 64) covers the full range of sizes.

hypothesis.extra.numpy.
integer_dtypes
(endianness='?', sizes=(8, 16, 32, 64))[source]¶ Return a strategy for signed integer dtypes.
endianness and sizes are treated as for
unsigned_integer_dtypes()
.

hypothesis.extra.numpy.
floating_dtypes
(endianness='?', sizes=(16, 32, 64))[source]¶ Return a strategy for floatingpoint dtypes.
sizes is the size in bits of floatingpoint number. Some machines support 96 or 128bit floats, but these are not generated by default.
Larger floats (96 and 128 bit real parts) are not supported on all platforms and therefore disabled by default. To generate these dtypes, include these values in the sizes argument.

hypothesis.extra.numpy.
complex_number_dtypes
(endianness='?', sizes=(64, 128))[source]¶ Return a strategy for complexnumber dtypes.
sizes is the total size in bits of a complex number, which consists of two floats. Complex halfs (a 16bit real part) are not supported by numpy and will not be generated by this strategy.

hypothesis.extra.numpy.
datetime64_dtypes
(max_period='Y', min_period='ns', endianness='?')[source]¶ Return a strategy for datetime64 dtypes, with various precisions from year to attosecond.

hypothesis.extra.numpy.
timedelta64_dtypes
(max_period='Y', min_period='ns', endianness='?')[source]¶ Return a strategy for timedelta64 dtypes, with various precisions from year to attosecond.

hypothesis.extra.numpy.
byte_string_dtypes
(endianness='?', min_len=0, max_len=16)[source]¶ Return a strategy for generating bytestring dtypes, of various lengths and byteorder.

hypothesis.extra.numpy.
unicode_string_dtypes
(endianness='?', min_len=0, max_len=16)[source]¶ Return a strategy for generating unicode string dtypes, of various lengths and byteorder.

hypothesis.extra.numpy.
array_dtypes
(subtype_strategy=scalar_dtypes(), min_size=1, max_size=5, allow_subarrays=False)[source]¶ Return a strategy for generating array (compound) dtypes, with members drawn from the given subtype strategy.

hypothesis.extra.numpy.
nested_dtypes
(subtype_strategy=scalar_dtypes(), max_leaves=10, max_itemsize=None)[source]¶ Return the mostgeneral dtype strategy.
Elements drawn from this strategy may be simple (from the subtype_strategy), or several such values drawn from
array_dtypes()
withallow_subarrays=True
. Subdtypes in an array dtype may be nested to any depth, subject to the max_leaves argument.
pandas¶
Hypothesis provides strategies for several of the core pandas data types:
pandas.Index
, pandas.Series
and pandas.DataFrame
.
The general approach taken by the pandas module is that there are multiple
strategies for generating indexes, and all of the other strategies take the
number of entries they contain from their index strategy (with sensible defaults).
So e.g. a Series is specified by specifying its numpy.dtype
(and/or
a strategy for generating elements for it).

hypothesis.extra.pandas.
indexes
(elements=None, dtype=None, min_size=0, max_size=None, unique=True)[source]¶ Provides a strategy for producing a
pandas.Index
.Arguments:
 elements is a strategy which will be used to generate the individual values of the index. If None, it will be inferred from the dtype. Note: even if the elements strategy produces tuples, the generated value will not be a MultiIndex, but instead be a normal index whose elements are tuples.
 dtype is the dtype of the resulting index. If None, it will be inferred from the elements strategy. At least one of dtype or elements must be provided.
 min_size is the minimum number of elements in the index.
 max_size is the maximum number of elements in the index. If None then it will default to a suitable small size. If you want larger indexes you should pass a max_size explicitly.
 unique specifies whether all of the elements in the resulting index should be distinct.

hypothesis.extra.pandas.
range_indexes
(min_size=0, max_size=None)[source]¶ Provides a strategy which generates an
Index
whose values are 0, 1, …, n for some n.Arguments:
 min_size is the smallest number of elements the index can have.
 max_size is the largest number of elements the index can have. If None it will default to some suitable value based on min_size.

hypothesis.extra.pandas.
series
(elements=None, dtype=None, index=None, fill=None, unique=False)[source]¶ Provides a strategy for producing a
pandas.Series
.Arguments:
elements: a strategy that will be used to generate the individual values in the series. If None, we will attempt to infer a suitable default from the dtype.
dtype: the dtype of the resulting series and may be any value that can be passed to
numpy.dtype
. If None, will use pandas’s standard behaviour to infer it from the type of the elements values. Note that if the type of values that comes out of your elements strategy varies, then so will the resulting dtype of the series.index: If not None, a strategy for generating indexes for the resulting Series. This can generate either
pandas.Index
objects or any sequence of values (which will be passed to the Index constructor).You will probably find it most convenient to use the
indexes()
orrange_indexes()
function to produce values for this argument.
Usage:
>>> series(dtype=int).example() 0 2001747478 1 1153062837

class
hypothesis.extra.pandas.
column
(name=None, elements=None, dtype=None, fill=None, unique=False)[source]¶ Data object for describing a column in a DataFrame.
Arguments:
 name: the column name, or None to default to the column position. Must be hashable, but can otherwise be any value supported as a pandas column name.
 elements: the strategy for generating values in this column, or None to infer it from the dtype.
 dtype: the dtype of the column, or None to infer it from the element strategy. At least one of dtype or elements must be provided.
 fill: A default value for elements of the column. See
arrays()
for a full explanation.  unique: If all values in this column should be distinct.

hypothesis.extra.pandas.
columns
(names_or_number, dtype=None, elements=None, fill=None, unique=False)[source]¶ A convenience function for producing a list of
column
objects of the same general shape.The names_or_number argument is either a sequence of values, the elements of which will be used as the name for individual column objects, or a number, in which case that many unnamed columns will be created. All other arguments are passed through verbatim to create the columns.

hypothesis.extra.pandas.
data_frames
(columns=None, rows=None, index=None)[source]¶ Provides a strategy for producing a
pandas.DataFrame
.Arguments:
columns: An iterable of
column
objects describing the shape of the generated DataFrame.rows: A strategy for generating a row object. Should generate either dicts mapping column names to values or a sequence mapping column position to the value in that position (note that unlike the
pandas.DataFrame
constructor, single values are not allowed here. Passing e.g. an integer is an error, even if there is only one column).At least one of rows and columns must be provided. If both are provided then the generated rows will be validated against the columns and an error will be raised if they don’t match.
Caveats on using rows:
 In general you should prefer using columns to rows, and only use rows if the columns interface is insufficiently flexible to describe what you need  you will get better performance and example quality that way.
 If you provide rows and not columns, then the shape and dtype of the resulting DataFrame may vary. e.g. if you have a mix of int and float in the values for one column in your row entries, the column will sometimes have an integral dtype and sometimes a float.
index: If not None, a strategy for generating indexes for the resulting DataFrame. This can generate either
pandas.Index
objects or any sequence of values (which will be passed to the Index constructor).You will probably find it most convenient to use the
indexes()
orrange_indexes()
function to produce values for this argument.
Usage:
The expected usage pattern is that you use
column
andcolumns()
to specify a fixed shape of the DataFrame you want as follows. For example the following gives a two column data frame:>>> from hypothesis.extra.pandas import column, data_frames >>> data_frames([ ... column('A', dtype=int), column('B', dtype=float)]).example() A B 0 2021915903 1.793898e+232 1 1146643993 inf 2 2096165693 1.000000e+07
If you want the values in different columns to interact in some way you can use the rows argument. For example the following gives a two column DataFrame where the value in the first column is always at most the value in the second:
>>> from hypothesis.extra.pandas import column, data_frames >>> import hypothesis.strategies as st >>> data_frames( ... rows=st.tuples(st.floats(allow_nan=False), ... st.floats(allow_nan=False)).map(sorted) ... ).example() 0 1 0 3.402823e+38 9.007199e+15 1 1.562796e298 5.000000e01
You can also combine the two:
>>> from hypothesis.extra.pandas import columns, data_frames >>> import hypothesis.strategies as st >>> data_frames( ... columns=columns(["lo", "hi"], dtype=float), ... rows=st.tuples(st.floats(allow_nan=False), ... st.floats(allow_nan=False)).map(sorted) ... ).example() lo hi 0 9.314723e49 4.353037e+45 1 9.999900e01 1.000000e+07 2 2.152861e+134 1.069317e73
(Note that the column dtype must still be specified and will not be inferred from the rows. This restriction may be lifted in future).
Combining rows and columns has the following behaviour:
 The column names and dtypes will be used.
 If the column is required to be unique, this will be enforced.
 Any values missing from the generated rows will be provided using the column’s fill.
 Any values in the row not present in the column specification (if dicts are passed, if there are keys with no corresponding column name, if sequences are passed if there are too many items) will result in InvalidArgument being raised.
Supported Versions¶
There is quite a lot of variation between pandas versions. We only commit to supporting the latest version of pandas, but older minor versions are supported on a “best effort” basis. Hypothesis is currently tested against and confirmed working with Pandas 0.19, 0.20, 0.21, 0.22, and 0.23.
Releases that are not the latest patch release of their minor version are not tested or officially supported, but will probably also work unless you hit a pandas bug.