How to Perform Chi-Square Test in Python Easily
You can perform a chi-square test in Python using
scipy.stats.chi2_contingency. This function takes a contingency table (a 2D list or array) as input and returns the test statistic, p-value, degrees of freedom, and expected frequencies.Syntax
The chi-square test in Python is done using the chi2_contingency function from the scipy.stats module.
Its basic syntax is:
chi2_contingency(observed): whereobservedis a 2D array or list representing the contingency table of observed frequencies.- The function returns four values:
chi2(test statistic),p(p-value),dof(degrees of freedom), andexpected(expected frequencies table).
python
from scipy.stats import chi2_contingency # observed is a 2D list or array chi2, p, dof, expected = chi2_contingency(observed)
Example
This example shows how to perform a chi-square test on a 2x2 contingency table. It tests if two categorical variables are independent.
python
from scipy.stats import chi2_contingency # Create a 2x2 contingency table observed = [[10, 20], [20, 40]] # Perform chi-square test chi2, p, dof, expected = chi2_contingency(observed) print(f"Chi-square statistic: {chi2:.4f}") print(f"p-value: {p:.4f}") print(f"Degrees of freedom: {dof}") print("Expected frequencies:") print(expected)
Output
Chi-square statistic: 0.0000
p-value: 1.0000
Degrees of freedom: 1
Expected frequencies:
[[10. 20.]
[20. 40.]]
Common Pitfalls
- Wrong input format: The input must be a 2D list or array of observed counts, not raw data.
- Small sample sizes: Chi-square test may not be valid if expected frequencies are too low (usually less than 5).
- Misinterpreting p-value: A high p-value means no evidence to reject independence, not proof of independence.
python
from scipy.stats import chi2_contingency # Wrong: passing raw data instead of contingency table # raw_data = ["A", "A", "B", "B"] # This is incorrect # Right: create contingency table first observed = [[10, 20], [20, 40]] chi2, p, dof, expected = chi2_contingency(observed)
Quick Reference
Remember these key points when using chi2_contingency:
| Step | Description |
|---|---|
| Prepare data | Create a 2D contingency table of observed counts |
| Call function | chi2_contingency(observed) returns test results |
| Check p-value | If p < 0.05, variables are likely dependent |
| Check assumptions | Expected counts should be >= 5 for validity |
Key Takeaways
Use scipy.stats.chi2_contingency with a 2D observed frequency table to perform chi-square test.
The function returns chi-square statistic, p-value, degrees of freedom, and expected frequencies.
Ensure your input is a contingency table, not raw data.
Check that expected frequencies are not too low to trust the test results.
Interpret p-value carefully: low p-value suggests dependence, high p-value means no strong evidence against independence.