How to Perform Hypothesis Testing in Python Easily
To perform hypothesis testing in Python, use the
scipy.stats library which provides functions like ttest_ind for t-tests. You set up your null and alternative hypotheses, run the test function, and interpret the p-value to decide if results are significant.Syntax
Hypothesis testing in Python commonly uses functions from scipy.stats. For example, the independent t-test syntax is:
scipy.stats.ttest_ind(sample1, sample2, equal_var=True)Here:
sample1andsample2are your data samples.equal_varassumes if variances are equal (True) or not (False).- The function returns a test statistic and a p-value.
python
from scipy import stats # Syntax for independent t-test test_statistic, p_value = stats.ttest_ind(sample1, sample2, equal_var=True)
Example
This example shows how to test if two groups have different average values using an independent t-test.
python
from scipy import stats # Sample data: heights of two groups group1 = [170, 172, 168, 165, 174] group2 = [160, 162, 158, 155, 164] # Perform independent t-test stat, p = stats.ttest_ind(group1, group2) print(f"Test Statistic: {stat:.3f}") print(f"P-value: {p:.3f}") # Interpret result if p < 0.05: print("Reject null hypothesis: groups differ significantly.") else: print("Fail to reject null hypothesis: no significant difference.")
Output
Test Statistic: 6.324
P-value: 0.000
Reject null hypothesis: groups differ significantly.
Common Pitfalls
Common mistakes when performing hypothesis testing include:
- Not checking if data meets test assumptions like normality or equal variances.
- Misinterpreting the p-value: a small p-value means evidence against the null hypothesis, not proof of the alternative.
- Using the wrong test for your data type or sample design.
Always check assumptions and choose the correct test.
python
from scipy import stats # Wrong: using t-test on non-normal data without checking non_normal_data1 = [1, 2, 2, 3, 100] non_normal_data2 = [2, 3, 3, 4, 5] # Right: check normality first stat1, p1 = stats.shapiro(non_normal_data1) stat2, p2 = stats.shapiro(non_normal_data2) print(f"Group1 normality p-value: {p1:.3f}") print(f"Group2 normality p-value: {p2:.3f}") if p1 < 0.05 or p2 < 0.05: print("Data not normal, consider non-parametric test like Mann-Whitney U test.") else: stat, p = stats.ttest_ind(non_normal_data1, non_normal_data2) print(f"T-test p-value: {p:.3f}")
Output
Group1 normality p-value: 0.017
Group2 normality p-value: 0.200
Data not normal, consider non-parametric test like Mann-Whitney U test.
Quick Reference
Here is a quick guide to common hypothesis tests in Python:
| Test | Function | Use Case |
|---|---|---|
| Independent t-test | scipy.stats.ttest_ind | Compare means of two independent groups |
| Paired t-test | scipy.stats.ttest_rel | Compare means of two related groups |
| One-sample t-test | scipy.stats.ttest_1samp | Compare sample mean to known value |
| Mann-Whitney U | scipy.stats.mannwhitneyu | Non-parametric test for two independent groups |
| Chi-square test | scipy.stats.chi2_contingency | Test association between categorical variables |
Key Takeaways
Use scipy.stats functions like ttest_ind to perform hypothesis tests in Python.
Always check assumptions like normality before choosing a test.
Interpret p-values correctly: small p-value means evidence against null hypothesis.
Choose the right test based on your data type and study design.
Non-parametric tests are alternatives when data does not meet assumptions.