How to perform AB testing python

Data-analysis-pythonHow-ToBeginner · 4 min read

How to Perform AB Testing in Python: Simple Guide

To perform AB testing in Python, you typically split your data into two groups and compare their results using statistical tests like t-test from the scipy.stats library. This helps you decide if one version performs better than the other with confidence.

📐

Syntax

AB testing involves these main steps:

Split your data into two groups: A (control) and B (variant).
Calculate the metric you want to compare (e.g., conversion rate) for each group.
Use a statistical test like scipy.stats.ttest_ind() to check if the difference is significant.

Here is the basic syntax for the t-test:

python

from scipy.stats import ttest_ind

# group_a and group_b are lists or arrays of numeric results
statistic, p_value = ttest_ind(group_a, group_b, equal_var=False)

if p_value < 0.05:
    print("Significant difference detected")
else:
    print("No significant difference")

💻

Example

This example shows how to perform AB testing on two groups with sample conversion rates. It uses a t-test to check if the difference is statistically significant.

python

from scipy.stats import ttest_ind

# Sample data: 1 means conversion, 0 means no conversion
group_a = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # Control group
group_b = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1]  # Variant group

statistic, p_value = ttest_ind(group_a, group_b, equal_var=False)

print(f"t-statistic: {statistic:.3f}")
print(f"p-value: {p_value:.3f}")

if p_value < 0.05:
    print("Result: Significant difference detected")
else:
    print("Result: No significant difference")

Output

t-statistic: -1.342 p-value: 0.201 Result: No significant difference

⚠️

Common Pitfalls

Some common mistakes when performing AB testing in Python include:

Not splitting data randomly, which can bias results.
Using the wrong statistical test for your data type.
Ignoring assumptions like equal variances (use equal_var=False if unsure).
Misinterpreting p-values (a p-value above 0.05 means no strong evidence of difference, not that groups are identical).

Example of a wrong approach and the correct way:

python

# Wrong: Using t-test on categorical counts without raw data
# group_a_conversions = 50
# group_a_total = 100
# group_b_conversions = 60
# group_b_total = 100
# ttest_ind([50], [60])  # Incorrect

# Right: Use raw binary data or proportions with proper test (e.g., chi-square or proportion test)
from statsmodels.stats.proportion import proportions_ztest

count = [50, 60]
nobs = [100, 100]
stat, pval = proportions_ztest(count, nobs)
print(f"z-statistic: {stat:.3f}, p-value: {pval:.3f}")

Output

z-statistic: -1.095, p-value: 0.273

📊

Quick Reference

Tips for AB testing in Python:

Always randomize your groups to avoid bias.
Use scipy.stats.ttest_ind() for comparing means of two independent samples.
For proportions, consider statsmodels.stats.proportion.proportions_ztest().
Check assumptions like variance equality and sample size.
Interpret p-values carefully: below 0.05 usually means significant difference.

✅

Key Takeaways

Split your data randomly into control and variant groups before testing.

Use appropriate statistical tests like t-test or proportion z-test depending on your data.

Check assumptions such as equal variances and sample size for valid results.

A p-value below 0.05 usually indicates a significant difference between groups.

Avoid common mistakes like testing aggregated counts with wrong methods.