Test Catalogue đź“–
Below is a full list of all available tests within Wimsey.
mean_should
These that a column’s mean falls within specified range.
Example use for yaml:
be_greater_than: 7
be_greater_than_or_equal_to: 8
be_less_than: 10
be_less_than_or_equal_to: 9
column: rating
test: mean_should
Example use json:
{
"test": "mean_should",
"column": "rating",
"be_less_than": 10,
"be_less_than_or_equal_to": 9,
"be_greater_than": 7,
"be_greater_than_or_equal_to": 8
}
Example use for python:
from wimsey.tests import mean_should
my_test = mean_should(
column="rating",
be_less_than=10,
be_less_than_or_equal_to=9,
be_greater_than=7,
be_greater_than_or_equal_to=8,
)
min_should
These that a column’s min falls within specified range.
Example use for yaml:
be_greater_than: 4
be_greater_than_or_equal_to: 6
be_less_than: 8
be_less_than_or_equal_to: 7
column: rating
test: min_should
Example use json:
{
"test": "min_should",
"column": "rating",
"be_less_than": 8,
"be_less_than_or_equal_to": 7,
"be_greater_than": 4,
"be_greater_than_or_equal_to": 6
}
Example use for python:
from wimsey.tests import min_should
my_test = min_should(
column="rating",
be_less_than=8,
be_less_than_or_equal_to=7,
be_greater_than=4,
be_greater_than_or_equal_to=6,
)
std_should
These that a column’s standard deviation falls within specified range.
Example use for yaml:
be_greater_than: 1
be_greater_than_or_equal_to: 1.14
be_less_than: 2
be_less_than_or_equal_to: 1.15
column: rating
test: std_should
Example use json:
{
"test": "std_should",
"column": "rating",
"be_less_than": 2,
"be_less_than_or_equal_to": 1.15,
"be_greater_than": 1,
"be_greater_than_or_equal_to": 1.14
}
Example use for python:
from wimsey.tests import std_should
my_test = std_should(
column="rating",
be_less_than=2,
be_less_than_or_equal_to=1.15,
be_greater_than=1,
be_greater_than_or_equal_to=1.14,
)
count_should
These that a column’s count falls within specified range.
Example use for yaml:
be_greater_than: 3
be_greater_than_or_equal_to: 4
be_less_than: 6
be_less_than_or_equal_to: 5
column: first_name
test: count_should
Example use json:
{
"test": "count_should",
"column": "first_name",
"be_less_than": 6,
"be_less_than_or_equal_to": 5,
"be_greater_than": 3,
"be_greater_than_or_equal_to": 4
}
Example use for python:
from wimsey.tests import count_should
my_test = count_should(
column="first_name",
be_less_than=6,
be_less_than_or_equal_to=5,
be_greater_than=3,
be_greater_than_or_equal_to=4,
)
row_count_should
These that the tables row count falls within specified range.
Example use for yaml:
be_greater_than: 3
be_greater_than_or_equal_to: 4
be_less_than: 6
be_less_than_or_equal_to: 5
test: row_count_should
Example use json:
{
"test": "row_count_should",
"be_less_than": 6,
"be_less_than_or_equal_to": 5,
"be_greater_than": 3,
"be_greater_than_or_equal_to": 4
}
Example use for python:
from wimsey.tests import row_count_should
my_test = row_count_should(
be_less_than=6,
be_less_than_or_equal_to=5,
be_greater_than=3,
be_greater_than_or_equal_to=4,
)
average_difference_from_other_column_should
Compare the values between two columns, and test that they are, on average, within x distance of each other.
Example use for yaml:
be_greater_than: 25
be_greater_than_or_equal_to: 20
be_less_than: 30
be_less_than_or_equal_to: 27
column: cases_solved
other_column: rating
test: average_difference_from_other_column_should
Example use json:
{
"test": "average_difference_from_other_column_should",
"column": "cases_solved",
"other_column": "rating",
"be_less_than": 30,
"be_less_than_or_equal_to": 27,
"be_greater_than": 25,
"be_greater_than_or_equal_to": 20
}
Example use for python:
from wimsey.tests import average_difference_from_other_column_should
my_test = average_difference_from_other_column_should(
column="cases_solved",
other_column="rating",
be_less_than=30,
be_less_than_or_equal_to=27,
be_greater_than=25,
be_greater_than_or_equal_to=20,
)
average_ratio_to_other_column_should
Compare the values between two columns, and test that they are, on average, within a given ratio of each other. Where 2 would mean a column is twice as large as an other column.
Example use for yaml:
be_greater_than: 4
be_greater_than_or_equal_to: 4.3
be_less_than: 5
be_less_than_or_equal_to: 4.4
column: cases_solved
other_column: rating
test: average_ratio_to_other_column_should
Example use json:
{
"test": "average_ratio_to_other_column_should",
"column": "cases_solved",
"other_column": "rating",
"be_less_than": 5,
"be_less_than_or_equal_to": 4.4,
"be_greater_than": 4,
"be_greater_than_or_equal_to": 4.3
}
Example use for python:
from wimsey.tests import average_ratio_to_other_column_should
my_test = average_ratio_to_other_column_should(
column="cases_solved",
other_column="rating",
be_less_than=5,
be_less_than_or_equal_to=4.4,
be_greater_than=4,
be_greater_than_or_equal_to=4.3,
)
max_string_length_should
Test the maximum string length falls within a given range.
Example use for yaml:
be_greater_than: 6
be_greater_than_or_equal_to: 7
be_less_than: 10
be_less_than_or_equal_to: 9
column: last_name
test: max_string_length_should
Example use json:
{
"test": "max_string_length_should",
"column": "last_name",
"be_less_than": 10,
"be_less_than_or_equal_to": 9,
"be_greater_than": 6,
"be_greater_than_or_equal_to": 7
}
Example use for python:
from wimsey.tests import max_string_length_should
my_test = max_string_length_should(
column="last_name",
be_less_than=10,
be_less_than_or_equal_to=9,
be_greater_than=6,
be_greater_than_or_equal_to=7,
)
all_values_should
Test individual values, either for appearing/not appearing in a list, or for matching against a regex expression. Note that regex matching may not be suported for all dataframe libraries.
Example use for yaml:
be_one_of:
- Peter
- Jane
- Father
- Hercule
- Beatrice
column: first_name
match_regex: \b[A-Z][a-zA-Z]*\b
not_be_one_of:
- George
- Sandah
test: all_values_should
Example use json:
{
"test": "all_values_should",
"column": "first_name",
"be_one_of": [
"Peter",
"Jane",
"Father",
"Hercule",
"Beatrice"
],
"not_be_one_of": [
"George",
"Sandah"
],
"match_regex": "\\b[A-Z][a-zA-Z]*\\b"
}
Example use for python:
from wimsey.tests import all_values_should
my_test = all_values_should(
column="first_name",
be_one_of=['Peter', 'Jane', 'Father', 'Hercule', 'Beatrice'],
not_be_one_of=['George', 'Sandah'],
match_regex="\b[A-Z][a-zA-Z]*\b",
)
type_should
Test column type. Note that regardless of dataframe library, you should give the
Narwhals (Polars-esque) datatype. For instance, Pandas may store dates internally as
np.datetime64 but Wimsey will determine this type as Datetime when running tests.
This has the benefit of meaning tests should be consistent on the same dataset, when
ran with different dataframe engines.
Example use for yaml:
be: String
be_one_of:
- String
- Float64
column: first_name
not_be: Int64
test: type_should
Example use json:
{
"test": "type_should",
"column": "first_name",
"be": "String",
"not_be": "Int64",
"be_one_of": [
"String",
"Float64"
]
}
Example use for python:
from wimsey.tests import type_should
my_test = type_should(
column="first_name",
be="String",
not_be="Int64",
be_one_of=['String', 'Float64'],
)
columns_should
Test for column appearance within a dataframe. be will specify the exact columns
and fail for additional records, whilst have and not_have will not cause failure
unless explicitly violated.
Example use for yaml:
be:
- first_name
- last_name
- rating
- cases_solved
have:
- first_name
not_have:
- middle_name
- secret_spy_name
test: columns_should
Example use json:
{
"test": "columns_should",
"have": [
"first_name"
],
"not_have": [
"middle_name",
"secret_spy_name"
],
"be": [
"first_name",
"last_name",
"rating",
"cases_solved"
]
}
Example use for python:
from wimsey.tests import columns_should
my_test = columns_should(
have=['first_name'],
not_have=['middle_name', 'secret_spy_name'],
be=['first_name', 'last_name', 'rating', 'cases_solved'],
)
null_count_should
Test the null count of a column falls within a given range.
Example use for yaml:
be_greater_than_or_equal_to: 0
be_less_than: 1
column: rating
test: null_count_should
Example use json:
{
"test": "null_count_should",
"column": "rating",
"be_less_than": 1,
"be_greater_than_or_equal_to": 0
}
Example use for python:
from wimsey.tests import null_count_should
my_test = null_count_should(
column="rating",
be_less_than=1,
be_greater_than_or_equal_to=0,
)
null_percentage_should
Test the null percentage of a column falls within a given range.
Example use for yaml:
be_greater_than_or_equal_to: 0
be_less_than: 0.1
column: rating
test: null_percentage_should
Example use json:
{
"test": "null_percentage_should",
"column": "rating",
"be_less_than": 0.1,
"be_greater_than_or_equal_to": 0
}
Example use for python:
from wimsey.tests import null_percentage_should
my_test = null_percentage_should(
column="rating",
be_less_than=0.1,
be_greater_than_or_equal_to=0,
)
median_should
Test the median value of a column falls within a given range.
Example use for yaml:
be_greater_than: 7
be_greater_than_or_equal_to: 8
be_less_than: 10
be_less_than_or_equal_to: 9
column: rating
test: median_should
Example use json:
{
"test": "median_should",
"column": "rating",
"be_less_than": 10,
"be_less_than_or_equal_to": 9,
"be_greater_than": 7,
"be_greater_than_or_equal_to": 8
}
Example use for python:
from wimsey.tests import median_should
my_test = median_should(
column="rating",
be_less_than=10,
be_less_than_or_equal_to=9,
be_greater_than=7,
be_greater_than_or_equal_to=8,
)
sum_should
Test the total sum of a column falls within a given range.
Example use for yaml:
be_greater_than: 40
be_greater_than_or_equal_to: 43
be_less_than: 45
be_less_than_or_equal_to: 43
column: rating
test: sum_should
Example use json:
{
"test": "sum_should",
"column": "rating",
"be_less_than": 45,
"be_less_than_or_equal_to": 43,
"be_greater_than": 40,
"be_greater_than_or_equal_to": 43
}
Example use for python:
from wimsey.tests import sum_should
my_test = sum_should(
column="rating",
be_less_than=45,
be_less_than_or_equal_to=43,
be_greater_than=40,
be_greater_than_or_equal_to=43,
)