How to Perform AB Testing in Python: Simple Guide
To perform
AB testing in Python, you typically split your data into two groups and compare their results using statistical tests like t-test from the scipy.stats library. This helps you decide if one version performs better than the other with confidence.Syntax
AB testing involves these main steps:
- Split your data into two groups: A (control) and B (variant).
- Calculate the metric you want to compare (e.g., conversion rate) for each group.
- Use a statistical test like
scipy.stats.ttest_ind()to check if the difference is significant.
Here is the basic syntax for the t-test:
python
from scipy.stats import ttest_ind # group_a and group_b are lists or arrays of numeric results statistic, p_value = ttest_ind(group_a, group_b, equal_var=False) if p_value < 0.05: print("Significant difference detected") else: print("No significant difference")
Example
This example shows how to perform AB testing on two groups with sample conversion rates. It uses a t-test to check if the difference is statistically significant.
python
from scipy.stats import ttest_ind # Sample data: 1 means conversion, 0 means no conversion group_a = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0] # Control group group_b = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1] # Variant group statistic, p_value = ttest_ind(group_a, group_b, equal_var=False) print(f"t-statistic: {statistic:.3f}") print(f"p-value: {p_value:.3f}") if p_value < 0.05: print("Result: Significant difference detected") else: print("Result: No significant difference")
Output
t-statistic: -1.342
p-value: 0.201
Result: No significant difference
Common Pitfalls
Some common mistakes when performing AB testing in Python include:
- Not splitting data randomly, which can bias results.
- Using the wrong statistical test for your data type.
- Ignoring assumptions like equal variances (use
equal_var=Falseif unsure). - Misinterpreting p-values (a p-value above 0.05 means no strong evidence of difference, not that groups are identical).
Example of a wrong approach and the correct way:
python
# Wrong: Using t-test on categorical counts without raw data # group_a_conversions = 50 # group_a_total = 100 # group_b_conversions = 60 # group_b_total = 100 # ttest_ind([50], [60]) # Incorrect # Right: Use raw binary data or proportions with proper test (e.g., chi-square or proportion test) from statsmodels.stats.proportion import proportions_ztest count = [50, 60] nobs = [100, 100] stat, pval = proportions_ztest(count, nobs) print(f"z-statistic: {stat:.3f}, p-value: {pval:.3f}")
Output
z-statistic: -1.095, p-value: 0.273
Quick Reference
Tips for AB testing in Python:
- Always randomize your groups to avoid bias.
- Use
scipy.stats.ttest_ind()for comparing means of two independent samples. - For proportions, consider
statsmodels.stats.proportion.proportions_ztest(). - Check assumptions like variance equality and sample size.
- Interpret p-values carefully: below 0.05 usually means significant difference.
Key Takeaways
Split your data randomly into control and variant groups before testing.
Use appropriate statistical tests like t-test or proportion z-test depending on your data.
Check assumptions such as equal variances and sample size for valid results.
A p-value below 0.05 usually indicates a significant difference between groups.
Avoid common mistakes like testing aggregated counts with wrong methods.