0
0
Data Analysis Pythondata~5 mins

Chi-squared test in Data Analysis Python

Choose your learning style9 modes available
Introduction

The Chi-squared test helps us check if two things are related or just happen by chance.

You want to see if a new teaching method affects pass rates in a class.
You want to check if people prefer different flavors of ice cream by age group.
You want to find out if a medicine works differently for men and women.
You want to test if a website layout affects user clicks based on device type.
Syntax
Data Analysis Python
from scipy.stats import chi2_contingency

chi2, p, dof, expected = chi2_contingency(observed_table)

observed_table is a 2D list or array showing counts for categories.

The function returns the test statistic, p-value, degrees of freedom, and expected counts.

Examples
Basic example with a 2x2 table of counts.
Data Analysis Python
from scipy.stats import chi2_contingency

observed = [[10, 20], [20, 40]]
chi2, p, dof, expected = chi2_contingency(observed)
print(f"p-value: {p}")
Example with a 2x3 table to test independence across more categories.
Data Analysis Python
from scipy.stats import chi2_contingency

observed = [[15, 25, 30], [10, 20, 40]]
chi2, p, dof, expected = chi2_contingency(observed)
print(f"Chi2: {chi2}, p-value: {p}")
Sample Program

This program tests if liking flavor A or B depends on gender using a Chi-squared test.

Data Analysis Python
from scipy.stats import chi2_contingency

# Observed counts of people liking two flavors by gender
observed = [
    [30, 10],  # Men: like flavor A, like flavor B
    [20, 40]   # Women: like flavor A, like flavor B
]

chi2, p, dof, expected = chi2_contingency(observed)

print(f"Chi-squared statistic: {chi2:.2f}")
print(f"Degrees of freedom: {dof}")
print(f"p-value: {p:.4f}")
print("Expected frequencies:")
for row in expected:
    print([round(x, 2) for x in row])
OutputSuccess
Important Notes

A small p-value (usually less than 0.05) means the categories are likely related.

The test needs counts, not percentages or averages.

Make sure the expected counts are not too small for reliable results.

Summary

The Chi-squared test checks if two categories are related or independent.

Use it with count data arranged in a table.

Look at the p-value to decide if the relationship is significant.