0
0
Data-analysis-pythonHow-ToBeginner ยท 4 min read

How to Perform AB Testing in Python: Simple Guide

To perform AB testing in Python, you typically split your data into two groups and compare their results using statistical tests like t-test from the scipy.stats library. This helps you decide if one version performs better than the other with confidence.
๐Ÿ“

Syntax

AB testing involves these main steps:

  • Split your data into two groups: A (control) and B (variant).
  • Calculate the metric you want to compare (e.g., conversion rate) for each group.
  • Use a statistical test like scipy.stats.ttest_ind() to check if the difference is significant.

Here is the basic syntax for the t-test:

python
from scipy.stats import ttest_ind

# group_a and group_b are lists or arrays of numeric results
statistic, p_value = ttest_ind(group_a, group_b, equal_var=False)

if p_value < 0.05:
    print("Significant difference detected")
else:
    print("No significant difference")
๐Ÿ’ป

Example

This example shows how to perform AB testing on two groups with sample conversion rates. It uses a t-test to check if the difference is statistically significant.

python
from scipy.stats import ttest_ind

# Sample data: 1 means conversion, 0 means no conversion
group_a = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # Control group
group_b = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1]  # Variant group

statistic, p_value = ttest_ind(group_a, group_b, equal_var=False)

print(f"t-statistic: {statistic:.3f}")
print(f"p-value: {p_value:.3f}")

if p_value < 0.05:
    print("Result: Significant difference detected")
else:
    print("Result: No significant difference")
Output
t-statistic: -1.342 p-value: 0.201 Result: No significant difference
โš ๏ธ

Common Pitfalls

Some common mistakes when performing AB testing in Python include:

  • Not splitting data randomly, which can bias results.
  • Using the wrong statistical test for your data type.
  • Ignoring assumptions like equal variances (use equal_var=False if unsure).
  • Misinterpreting p-values (a p-value above 0.05 means no strong evidence of difference, not that groups are identical).

Example of a wrong approach and the correct way:

python
# Wrong: Using t-test on categorical counts without raw data
# group_a_conversions = 50
# group_a_total = 100
# group_b_conversions = 60
# group_b_total = 100
# ttest_ind([50], [60])  # Incorrect

# Right: Use raw binary data or proportions with proper test (e.g., chi-square or proportion test)
from statsmodels.stats.proportion import proportions_ztest

count = [50, 60]
nobs = [100, 100]
stat, pval = proportions_ztest(count, nobs)
print(f"z-statistic: {stat:.3f}, p-value: {pval:.3f}")
Output
z-statistic: -1.095, p-value: 0.273
๐Ÿ“Š

Quick Reference

Tips for AB testing in Python:

  • Always randomize your groups to avoid bias.
  • Use scipy.stats.ttest_ind() for comparing means of two independent samples.
  • For proportions, consider statsmodels.stats.proportion.proportions_ztest().
  • Check assumptions like variance equality and sample size.
  • Interpret p-values carefully: below 0.05 usually means significant difference.
โœ…

Key Takeaways

Split your data randomly into control and variant groups before testing.
Use appropriate statistical tests like t-test or proportion z-test depending on your data.
Check assumptions such as equal variances and sample size for valid results.
A p-value below 0.05 usually indicates a significant difference between groups.
Avoid common mistakes like testing aggregated counts with wrong methods.