0
0
Data-analysis-pythonHow-ToBeginner ยท 4 min read

How to Perform Hypothesis Testing in Python Easily

To perform hypothesis testing in Python, use the scipy.stats library which provides functions like ttest_ind for t-tests. You set up your null and alternative hypotheses, run the test function, and interpret the p-value to decide if results are significant.
๐Ÿ“

Syntax

Hypothesis testing in Python commonly uses functions from scipy.stats. For example, the independent t-test syntax is:

scipy.stats.ttest_ind(sample1, sample2, equal_var=True)

Here:

  • sample1 and sample2 are your data samples.
  • equal_var assumes if variances are equal (True) or not (False).
  • The function returns a test statistic and a p-value.
python
from scipy import stats

# Syntax for independent t-test
test_statistic, p_value = stats.ttest_ind(sample1, sample2, equal_var=True)
๐Ÿ’ป

Example

This example shows how to test if two groups have different average values using an independent t-test.

python
from scipy import stats

# Sample data: heights of two groups
group1 = [170, 172, 168, 165, 174]
group2 = [160, 162, 158, 155, 164]

# Perform independent t-test
stat, p = stats.ttest_ind(group1, group2)

print(f"Test Statistic: {stat:.3f}")
print(f"P-value: {p:.3f}")

# Interpret result
if p < 0.05:
    print("Reject null hypothesis: groups differ significantly.")
else:
    print("Fail to reject null hypothesis: no significant difference.")
Output
Test Statistic: 6.324 P-value: 0.000 Reject null hypothesis: groups differ significantly.
โš ๏ธ

Common Pitfalls

Common mistakes when performing hypothesis testing include:

  • Not checking if data meets test assumptions like normality or equal variances.
  • Misinterpreting the p-value: a small p-value means evidence against the null hypothesis, not proof of the alternative.
  • Using the wrong test for your data type or sample design.

Always check assumptions and choose the correct test.

python
from scipy import stats

# Wrong: using t-test on non-normal data without checking
non_normal_data1 = [1, 2, 2, 3, 100]
non_normal_data2 = [2, 3, 3, 4, 5]

# Right: check normality first
stat1, p1 = stats.shapiro(non_normal_data1)
stat2, p2 = stats.shapiro(non_normal_data2)

print(f"Group1 normality p-value: {p1:.3f}")
print(f"Group2 normality p-value: {p2:.3f}")

if p1 < 0.05 or p2 < 0.05:
    print("Data not normal, consider non-parametric test like Mann-Whitney U test.")
else:
    stat, p = stats.ttest_ind(non_normal_data1, non_normal_data2)
    print(f"T-test p-value: {p:.3f}")
Output
Group1 normality p-value: 0.017 Group2 normality p-value: 0.200 Data not normal, consider non-parametric test like Mann-Whitney U test.
๐Ÿ“Š

Quick Reference

Here is a quick guide to common hypothesis tests in Python:

TestFunctionUse Case
Independent t-testscipy.stats.ttest_indCompare means of two independent groups
Paired t-testscipy.stats.ttest_relCompare means of two related groups
One-sample t-testscipy.stats.ttest_1sampCompare sample mean to known value
Mann-Whitney Uscipy.stats.mannwhitneyuNon-parametric test for two independent groups
Chi-square testscipy.stats.chi2_contingencyTest association between categorical variables
โœ…

Key Takeaways

Use scipy.stats functions like ttest_ind to perform hypothesis tests in Python.
Always check assumptions like normality before choosing a test.
Interpret p-values correctly: small p-value means evidence against null hypothesis.
Choose the right test based on your data type and study design.
Non-parametric tests are alternatives when data does not meet assumptions.