Statistical Tests
This module contains various statistical tests for AB testing.
AbsoluteIndependentTTest
Bases: BaseTest
Source code in aboba/tests/absolute_ttest.py
__init__
__init__(value_column='target', equal_var=True, random_state=None, alternative='two-sided', alpha=0.05)
Independent t-test for absolute difference between two groups.
This test compares the means of two independent groups to determine if there is a statistically significant difference between them in absolute terms.
| PARAMETER | DESCRIPTION |
|---|---|
value_column
|
Name of the column containing the values to test.
TYPE:
|
equal_var
|
If True, perform a standard independent 2 sample test that assumes equal population variances. If False, perform Welch's t-test, which does not assume equal population variances.
TYPE:
|
random_state
|
Seed for the random number generator.
TYPE:
|
alternative
|
Defines the alternative hypothesis. Options are: 'two-sided' (default), 'less', or 'greater'.
TYPE:
|
Examples:
import pandas as pd
import numpy as np
from aboba.tests.absolute_ttest import AbsoluteIndependentTTest
# Create sample data
np.random.seed(42)
group_a = pd.DataFrame({'target': np.random.normal(10, 2, 100)})
group_b = pd.DataFrame({'target': np.random.normal(12, 2, 100)})
# Perform the test
test = AbsoluteIndependentTTest(value_column='target')
result = test.test([group_a, group_b], {})
print(f"P-value: {result.pvalue:.4f}")
print(f"Effect: {result.effect:.4f}")
Source code in aboba/tests/absolute_ttest.py
AbsoluteRelatedTTest
Bases: BaseTest
Performs a paired (related) two-sample t-test on absolute data.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value_column |
Name of the column containing the values to test.
TYPE:
|
alternative |
Defines the alternative hypothesis. The following options are available (default is 'two-sided'): - 'two-sided': the means of the distributions underlying the samples are unequal. - 'greater': the mean of the distribution underlying the first sample is greater. - 'less': the mean of the distribution underlying the first sample is smaller.
TYPE:
|
alpha |
Significance level for confidence intervals
TYPE:
|
Source code in aboba/tests/absolute_ttest.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
__init__
Related (paired) t-test for absolute difference between two groups.
This test compares the means of two related groups to determine if there is a statistically significant difference between them in absolute terms. It is typically used when the same subjects are measured twice (before/after).
| PARAMETER | DESCRIPTION |
|---|---|
value_column
|
Name of the column containing the values to test.
TYPE:
|
alternative
|
Defines the alternative hypothesis. Options are: 'two-sided' (default), 'less', or 'greater'.
TYPE:
|
Examples:
import pandas as pd
import numpy as np
from aboba.tests.absolute_ttest import AbsoluteRelatedTTest
# Create sample paired data
np.random.seed(42)
before = np.random.normal(10, 2, 50)
after = before + np.random.normal(0.5, 1, 50) # Adding effect
group_a = pd.DataFrame({'target': before})
group_b = pd.DataFrame({'target': after})
# Perform the test
test = AbsoluteRelatedTTest(value_column='target')
result = test.test([group_a, group_b], {})
print(f"P-value: {result.pvalue:.4f}")
print(f"Effect: {result.effect:.4f}")
Source code in aboba/tests/absolute_ttest.py
RelativeIndependentTTest
Bases: BaseTest
Performs an independent t-test using a ratio-based measure for effect size relative to the control group.
Compatible with CUPED preprocessing: when used with CupedProcessor, automatically uses the original (pre-CUPED) control mean for denominator calculation, ensuring correct interpretation of relative effects while benefiting from variance reduction.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value_column |
Name of the column containing the values to test.
TYPE:
|
alternative |
Defines the alternative hypothesis. Must be one of {'two-sided', 'less', 'greater'}.
TYPE:
|
alpha |
Significance level for confidence intervals (default: 0.05).
TYPE:
|
Source code in aboba/tests/relative_ttest.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
__init__
Independent t-test for relative difference between two groups.
This test compares the means of two independent groups to determine if there is a statistically significant relative difference between them. The relative difference is calculated as (test_mean - control_mean) / control_mean.
When used after CUPED preprocessing, automatically uses the original control mean (before CUPED transformation) for correct relative effect calculation. This provides the best of both worlds: variance reduction from CUPED and correct relative effect interpretation.
| PARAMETER | DESCRIPTION |
|---|---|
value_column
|
Name of the column containing the values to test.
TYPE:
|
alternative
|
Defines the alternative hypothesis. Options are: 'two-sided' (default), 'less', or 'greater'.
TYPE:
|
alpha
|
Significance level for confidence intervals (default: 0.05).
TYPE:
|
Examples:
import pandas as pd
import numpy as np
from aboba.tests.relative_ttest import RelativeIndependentTTest
# Example 1: Basic usage without CUPED
np.random.seed(42)
control = pd.DataFrame({'target': np.random.normal(100, 10, 100)})
test = pd.DataFrame({'target': np.random.normal(105, 10, 100)}) # 5% increase
test_instance = RelativeIndependentTTest(value_column='target')
result = test_instance.test([control, test])
print(f"P-value: {result.pvalue:.4f}")
print(f"Relative Effect: {result.effect:.4f} ({result.effect*100:.2f}%)")
# Example 2: Usage with CUPED (artifacts passed automatically by pipeline)
# The test will automatically detect and use original control mean from artifacts
Source code in aboba/tests/relative_ttest.py
test
Perform the relative independent t-test on the provided groups.
| PARAMETER | DESCRIPTION |
|---|---|
groups
|
List of two DataFrames representing the groups to compare. The first group is treated as the control group, the second as the test group.
TYPE:
|
artefacts
|
Artifacts from preprocessing pipeline. If 'cuped_original_control_mean' is present, uses it as denominator for relative effect. This enables correct relative effect calculation when using CUPED.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TestResult
|
Object containing the p-value and relative effect size. - pvalue: Statistical significance - effect: Relative effect (test - control) / control_mean - effect_type: "relative_control" - effect_interval: Confidence interval for the effect
TYPE:
|
Source code in aboba/tests/relative_ttest.py
StratifiedTTest
Bases: BaseTest
Source code in aboba/tests/stratified_ttest.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
__init__
__init__(group_column: str, group_size: int, method: str, strata_columns: List[str], strata_weights: Union[Series, dict], col_name: str = 'target', alpha: float = 0.05)
Performs a stratified t-test on the data.
This test performs a t-test while accounting for stratification in the data. Strata weights must be provided by the caller and represent global (population) proportions for each stratum — they are not inferred from the sample.
| PARAMETER | DESCRIPTION |
|---|---|
group_column
|
Name of the column containing group identifiers.
TYPE:
|
group_size
|
Size of groups to sample (used externally by splitters).
TYPE:
|
method
|
Weighting method. One of 'random', 'stratified', 'post_stratified'.
TYPE:
|
strata_columns
|
List of columns to stratify by.
TYPE:
|
strata_weights
|
Global (population) weights for each stratum. Keys/index must match the unique values of the strata columns. Will be normalised to sum to 1.
TYPE:
|
col_name
|
Name of the column to test.
TYPE:
|
alpha
|
Significance level for confidence interval.
TYPE:
|
Examples:
import pandas as pd
import numpy as np
from aboba.tests.stratified_ttest import StratifiedTTest
np.random.seed(42)
data = pd.DataFrame({
'group': np.repeat(['A', 'B'], 100),
'strata': np.tile(['X', 'Y'], 100),
'target': np.concatenate([
np.random.normal(10, 2, 100),
np.random.normal(12, 2, 100)
])
})
strata_weights = pd.Series({'X': 0.4, 'Y': 0.6})
test = StratifiedTTest(
group_column='group',
group_size=50,
method='stratified',
strata_columns=['strata'],
strata_weights=strata_weights,
col_name='target',
)
groups = [data[data['group'] == 'A'], data[data['group'] == 'B']]
result = test.test(groups, {})
print(f"P-value: {result.pvalue:.4f}")
print(f"Effect: {result.effect:.4f}")
Source code in aboba/tests/stratified_ttest.py
test
Perform the stratified t-test on the provided groups.
| PARAMETER | DESCRIPTION |
|---|---|
groups
|
List of two DataFrames representing the groups.
TYPE:
|
artefacts
|
Unused; kept for interface compatibility.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TestResult
|
Object containing the p-value and effect size.
TYPE:
|
Source code in aboba/tests/stratified_ttest.py
CupedLinearRegressionTTest
Bases: BaseTest
Source code in aboba/tests/cuped_lreg.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
__init__
__init__(covariate_names: Optional[List[str]] = None, group_column: str = 'group', value_column: str = 'target', alpha: float = 0.05, center_on_control: bool = True, weight_column: Optional[str] = None, include_extra: bool = False, strata_column: Optional[str] = None, strata_weights: Optional[Union[Dict, Series]] = None) -> None
CUPED (Controlled-experiment Using Pre-Experiment Data) via linear regression.
This test uses linear regression to adjust for pre-experiment covariates, reducing variance and increasing statistical power. The method centers covariates on the control group mean and estimates the treatment effect using OLS or WLS regression with heteroscedasticity-robust standard errors (HC3).
| PARAMETER | DESCRIPTION |
|---|---|
covariate_names
|
List of pre-experiment covariates to adjust for. These should be variables measured before the experiment that correlate with the outcome metric.
TYPE:
|
group_column
|
Name of column containing group assignment (A/B). Default "group".
TYPE:
|
value_column
|
Name of column containing metric values to test. Default "target".
TYPE:
|
alpha
|
Significance level for confidence interval. Default 0.05.
TYPE:
|
center_on_control
|
If True, covariates are centered by their mean in control group. This is recommended for variance reduction. Default True.
TYPE:
|
weight_column
|
Column with observation weights for weighted least squares regression. If None, uses ordinary least squares.
TYPE:
|
include_extra
|
If True, includes additional regression artifacts (parameters, design matrix, residuals) in TestResult.extra. Default False.
TYPE:
|
strata_column
|
Column containing stratum identifiers. Used together with strata_weights to derive per-observation WLS weights when weight_column is not provided.
TYPE:
|
strata_weights
|
Global (population) weight for each stratum. When strata_column is set and weight_column is None, the per-observation weight is computed as: w_i = strata_weight[stratum_i] / count(stratum_i in group_i)
TYPE:
|
Examples:
import pandas as pd
import numpy as np
from aboba.tests.cuped_lreg import CupedLinearRegressionTTest
# Create sample data with pre-experiment covariate
np.random.seed(42)
n = 200
pre_metric = np.random.normal(100, 15, n)
# Control group
group_a = pd.DataFrame({
'target': pre_metric[:100] + np.random.normal(0, 10, 100),
'pre_metric': pre_metric[:100],
'group': 0
})
# Treatment group with effect
group_b = pd.DataFrame({
'target': pre_metric[100:] + np.random.normal(5, 10, 100),
'pre_metric': pre_metric[100:],
'group': 1
})
# Perform CUPED test
test = CupedLinearRegressionTTest(
covariate_names=['pre_metric'],
value_column='target',
group_column='group'
)
result = test.test([group_a, group_b], {})
print(f"P-value: {result.pvalue:.4f}")
print(f"Effect: {result.effect:.4f}")
print(f"CI: [{result.effect_interval[0]:.4f}, {result.effect_interval[1]:.4f}]")