Concept Flow - Chi-squared test

Start with observed data

↓

Calculate expected data

↓

Compute Chi-squared statistic

↓

Calculate p-value

↓

Compare p-value to significance level

↓

Reject H0

The test starts with observed data, calculates expected counts, then computes the Chi-squared statistic and p-value, and finally decides if the difference is significant.

Execution Sample

SciPy

from scipy.stats import chi2_contingency

observed = [[10, 20], [20, 40]]
chi2, p, dof, expected = chi2_contingency(observed)
print(p)

This code runs a Chi-squared test on a 2x2 table of observed counts and prints the p-value.

Execution Table

Step	Action	Value/Calculation	Result
1	Input observed data	[[10, 20], [20, 40]]	Observed counts set
2	Calculate row sums	[30, 60]	Row sums computed
3	Calculate column sums	[30, 60]	Column sums computed
4	Calculate total sum	90	Total count computed
5	Calculate expected counts	Expected = (row_sum * col_sum) / total	[[10.0, 20.0], [20.0, 40.0]]
6	Compute Chi-squared statistic	Sum((observed - expected)^2 / expected)	0.0
7	Calculate p-value	Using Chi-squared distribution with dof=1	1.0
8	Compare p-value to 0.05	1.0 > 0.05	Fail to reject null hypothesis
9	End	-	Test complete

💡 p-value is greater than 0.05, so we fail to reject the null hypothesis

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	After Step 5	After Step 6	After Step 7	Final
observed	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]	[[10, 20], [20, 40]]
row_sums	N/A	[30, 60]	[30, 60]	[30, 60]	[30, 60]	[30, 60]	[30, 60]	[30, 60]
col_sums	N/A	N/A	[30, 60]	[30, 60]	[30, 60]	[30, 60]	[30, 60]	[30, 60]
total	N/A	N/A	N/A	90	90	90	90	90
expected	N/A	N/A	N/A	N/A	[[10.0, 20.0], [20.0, 40.0]]	[[10.0, 20.0], [20.0, 40.0]]	[[10.0, 20.0], [20.0, 40.0]]	[[10.0, 20.0], [20.0, 40.0]]
chi2_statistic	N/A	N/A	N/A	N/A	N/A	0.0	0.0	0.0
p_value	N/A	N/A	N/A	N/A	N/A	N/A	1.0	1.0

Key Moments - 2 Insights

Why are the expected counts the same as the observed counts in this example?

What does a p-value of 1.0 mean in this test?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table at step 6. What is the Chi-squared statistic value?

A10.0

B1.0

C0.0

D90.0

Concept Snapshot

Chi-squared test syntax:
from scipy.stats import chi2_contingency
chi2, p, dof, expected = chi2_contingency(observed)

- observed: 2D array of counts
- Computes chi2 statistic and p-value
- p < 0.05 means reject null hypothesis
- Tests if observed differs from expected frequencies

Full Transcript

The Chi-squared test compares observed counts to expected counts to see if differences are significant. We start with observed data, calculate expected counts based on row and column totals, then compute the Chi-squared statistic as the sum of squared differences divided by expected counts. Next, we find the p-value from the Chi-squared distribution with appropriate degrees of freedom. If the p-value is less than 0.05, we reject the null hypothesis, meaning the observed data significantly differs from expected. Otherwise, we fail to reject it, meaning no significant difference. In the example, observed and expected counts are equal, so the Chi-squared statistic is 0 and p-value is 1, indicating no difference.