How to perform hypothesis testing python

Data-analysis-pythonHow-ToBeginner · 4 min read

How to Perform Hypothesis Testing in Python Easily

To perform hypothesis testing in Python, use the scipy.stats library which provides functions like ttest_ind for t-tests. You set up your null and alternative hypotheses, run the test function, and interpret the p-value to decide if results are significant.

📐

Syntax

Hypothesis testing in Python commonly uses functions from scipy.stats. For example, the independent t-test syntax is:

scipy.stats.ttest_ind(sample1, sample2, equal_var=True)

Here:

sample1 and sample2 are your data samples.
equal_var assumes if variances are equal (True) or not (False).
The function returns a test statistic and a p-value.

python

from scipy import stats

# Syntax for independent t-test
test_statistic, p_value = stats.ttest_ind(sample1, sample2, equal_var=True)

💻

Example

This example shows how to test if two groups have different average values using an independent t-test.

python

from scipy import stats

# Sample data: heights of two groups
group1 = [170, 172, 168, 165, 174]
group2 = [160, 162, 158, 155, 164]

# Perform independent t-test
stat, p = stats.ttest_ind(group1, group2)

print(f"Test Statistic: {stat:.3f}")
print(f"P-value: {p:.3f}")

# Interpret result
if p < 0.05:
    print("Reject null hypothesis: groups differ significantly.")
else:
    print("Fail to reject null hypothesis: no significant difference.")

Output

Test Statistic: 6.324 P-value: 0.000 Reject null hypothesis: groups differ significantly.

⚠️

Common Pitfalls

Common mistakes when performing hypothesis testing include:

Not checking if data meets test assumptions like normality or equal variances.
Misinterpreting the p-value: a small p-value means evidence against the null hypothesis, not proof of the alternative.
Using the wrong test for your data type or sample design.

Always check assumptions and choose the correct test.

python

from scipy import stats

# Wrong: using t-test on non-normal data without checking
non_normal_data1 = [1, 2, 2, 3, 100]
non_normal_data2 = [2, 3, 3, 4, 5]

# Right: check normality first
stat1, p1 = stats.shapiro(non_normal_data1)
stat2, p2 = stats.shapiro(non_normal_data2)

print(f"Group1 normality p-value: {p1:.3f}")
print(f"Group2 normality p-value: {p2:.3f}")

if p1 < 0.05 or p2 < 0.05:
    print("Data not normal, consider non-parametric test like Mann-Whitney U test.")
else:
    stat, p = stats.ttest_ind(non_normal_data1, non_normal_data2)
    print(f"T-test p-value: {p:.3f}")

Output

Group1 normality p-value: 0.017 Group2 normality p-value: 0.200 Data not normal, consider non-parametric test like Mann-Whitney U test.

📊

Quick Reference

Here is a quick guide to common hypothesis tests in Python:

Test	Function	Use Case
Independent t-test	scipy.stats.ttest_ind	Compare means of two independent groups
Paired t-test	scipy.stats.ttest_rel	Compare means of two related groups
One-sample t-test	scipy.stats.ttest_1samp	Compare sample mean to known value
Mann-Whitney U	scipy.stats.mannwhitneyu	Non-parametric test for two independent groups
Chi-square test	scipy.stats.chi2_contingency	Test association between categorical variables

✅

Key Takeaways

Use scipy.stats functions like ttest_ind to perform hypothesis tests in Python.

Always check assumptions like normality before choosing a test.

Interpret p-values correctly: small p-value means evidence against null hypothesis.

Choose the right test based on your data type and study design.

Non-parametric tests are alternatives when data does not meet assumptions.