Tutorial: Running AB Tests with aboba

This tutorial will guide you through the core concepts of the aboba library by walking through practical examples. You'll learn how to set up experiments, run tests, and analyze results.

Core Concepts

Before diving into examples, let's understand the key components:

Test: The statistical test you want to run (t-test, HSD, etc.)
Pipeline: A sequence of data processors and splitters that prepare your data
Splitter: Determines how to split data into groups
Processor: Transforms data (e.g., CUPED, bucketing)
Effect Modifier: Simulates synthetic effects for power analysis
Experiment: Orchestrates multiple test runs and visualizes results

Example 1: Basic Synthetic Data Experiment

Let's start with a simple example using synthetic data to understand the workflow.

Step 1: Generate Data

First, we'll create two groups from the same distribution N(0, 1):

import numpy as np
import pandas as pd
import scipy.stats as sps
from aboba import tests, splitters, effect_modifiers, experiment, pipeline

# Generate synthetic data
n = 1000
data_a = sps.norm.rvs(size=n, loc=0, scale=1)
data_b = sps.norm.rvs(size=n, loc=0, scale=1)

# Create a DataFrame with two groups
data = pd.DataFrame({
    'value': np.concatenate([data_a, data_b]),
    'b_group': np.concatenate([
        np.repeat(0, 1000),
        np.repeat(1, 1000)
    ])
})

Step 2: Create a Pipeline

The pipeline defines how to sample data. Here we'll sample 100 observations from each group:

group_size = 100
data_pipeline = pipeline.Pipeline([
    ('GroupSplitter', splitters.GroupSplitter(size=group_size, column='b_group')),
])

Step 3: Define the Test

We'll use an absolute independent t-test:

test = tests.AbsoluteIndependentTTest(
    value_column='value',
)

Step 4: Create an Experiment

The experiment hub manages multiple test groups and visualizes results:

exp = experiment.AbobaExperiment()

Step 5: Run AA Test (Validation)

First, run an AA test to verify the test is working correctly (both groups from same distribution):

aa_group = exp.group(
    "AA Test",
    test=test,
    data=data,
    data_pipeline=data_pipeline,
    n_iter=100,
    joblib_kwargs={"n_jobs": -1, "backend": "threading"}
).run()

The AA test should show p-values uniformly distributed between 0 and 1, confirming no false positives.

Step 6: Run AB Test with Synthetic Effect

Now add a synthetic effect to group 1 and run the test:

ab_group = exp.group(
    "AB Test (effect=0.3)",
    test=test,
    data=data,
    data_pipeline=data_pipeline,
    synthetic_effect=effect_modifiers.GroupModifier(
        effects={1: 0.3},  # Add 0.3 to group 1
        value_column='value',
        group_column='b_group',
    ),
    n_iter=100,
).run()

Step 7: Visualize Results

exp.draw()

This will show p-value distributions for both AA and AB tests. The AB test should show most p-values near 0, indicating the test successfully detected the effect.

Extracting Results

You can access detailed results from each group:

results_df = ab_group.get_data()
print(results_df.head())

Example 2: Real Data with CUPED

Now let's work with real data and use CUPED (Controlled-experiment Using Pre-Experiment Data) for variance reduction.

Understanding CUPED

CUPED is a variance reduction technique that uses pre-experiment data (covariates) to improve test sensitivity. It adjusts your target metric using information from a correlated covariate.

Step 1: Load Real Data

# Load Moscow flats dataset
data = pd.read_csv('data/flats_moscow.txt', sep='\t')

Step 2: Create a Custom Data Processor

Since CUPED needs the whole dataset before sampling, we'll create a processor to assign groups:

import aboba

class RandomGroupAssigner(aboba.base.BaseDataProcessor):
    def __init__(self, groups_n=2, column_name='group'):
        self.groups_n = groups_n
        self.column_name = column_name

    def transform(self, data: pd.DataFrame):
        n = data.shape[0]
        groups = np.random.randint(0, self.groups_n, size=n)
        data[self.column_name] = groups
        return data, None

Step 3: Build CUPED Pipeline

sample_size = 100
covariate = 'totsp'  # Total space as covariate

cuped_pipeline = pipeline.Pipeline([
    RandomGroupAssigner(groups_n=2),
    aboba.processing.EnsureColsProcessor(['price', 'group', covariate]),
    aboba.processing.CupedProcessor(
        value_column='price',
        covariate_column=covariate,
        result_column='price_cuped',
        group_column='group',
        group_test=1,
        group_control=0,
    ),
    splitters.GroupSplitter(column='group', size=sample_size),
    aboba.processing.EnsureColsProcessor(['price_cuped']),
])

Step 4: Run Tests with CUPED

exp = experiment.AbobaExperiment()

cuped_test = tests.AbsoluteIndependentTTest(
    value_column='price_cuped',
)

# AA test with CUPED
exp.group(
    "AA, CUPED",
    test=cuped_test,
    data=data,
    data_pipeline=cuped_pipeline,
    n_iter=100,
).run()

# AB test with CUPED
exp.group(
    "AB, CUPED (effect=10)",
    test=cuped_test,
    data=data,
    data_pipeline=cuped_pipeline,
    synthetic_effect=effect_modifiers.GroupModifier(
        effects={1: 10},
        value_column='price_cuped',
        group_column='group',
    ),
    n_iter=100,
).run()

exp.draw()

CUPED typically shows greater statistical power compared to regular t-tests, meaning it can detect smaller effects with the same sample size.

Example 3: Creating Custom Tests

You can create custom tests by inheriting from BaseTest. Here's an example of a relative t-test:

class RelativeIndependentTTest(aboba.base.BaseTest):
    def __init__(self, value_column="target", alternative="two-sided"):
        super().__init__()
        self.value_column = value_column
        self.alternative = alternative
        assert alternative in {"two-sided", "less", "greater"}

    def test(self, groups, artefacts):
        control_group, test_group = groups

        Y, X = control_group[self.value_column], test_group[self.value_column]
        var_1, var_2 = np.var(X, ddof=1), np.var(Y, ddof=1)
        a_1, a_2 = np.mean(X), np.mean(Y)

        # Calculate relative difference
        R = (a_1 - a_2) / a_2
        var_R = var_1 / (a_2**2) + (a_1**2) / (a_2**4) * var_2

        n = len(test_group)
        stat = np.sqrt(n) * R / np.sqrt(var_R)

        if self.alternative == "two-sided":
            pvalue = 2 * min(sps.norm.cdf(stat), sps.norm.sf(stat))
            pvalue = min(pvalue, 1)
        elif self.alternative == "less":
            pvalue = sps.norm.cdf(stat)
        elif self.alternative == "greater":
            pvalue = sps.norm.sf(stat)

        return aboba.base.TestResult(
            pvalue=pvalue, 
            effect=R, 
            effect_type="relative_control"
        )

Using Your Custom Test

relative_test = RelativeIndependentTTest(value_column='price')

exp.group(
    "AB, Relative Test",
    test=relative_test,
    data=data,
    data_pipeline=random_pipeline,
    synthetic_effect=effect_modifiers.GroupModifier(
        effects={1: 10},
        value_column='price',
        group_column='group',
    ),
    n_iter=100,
).run()

exp.draw()

Advanced: Flexible Effect Modifiers

Effect modifiers support multiple ways to add effects:

1. Constant Effect

effect_modifiers.GroupModifier(
    effects={1: 0.3},  # Add constant 0.3 to group 1
    value_column='value',
    group_column='b_group',
)

2. Function-Based Effect

def my_effect(obj):
    obj['value'] += 0.3
    return obj

effect_modifiers.GroupModifier(
    effects={0: my_effect},
    value_column='value',
    group_column='b_group',
)

3. Distribution-Based Effect

effect_modifiers.GroupModifier(
    effects={
        0: 0.9,
        1: sps.norm(0.3, 0.001)  # Random effect from normal distribution
    },
    value_column='value',
    group_column='b_group',
)

Next Steps

Explore the API Reference for all available tests
Learn about data processors for advanced transformations
Check out splitters for different sampling strategies
See multiple group tests for comparing more than two groups