0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Perform Chi-Square Test in Python Easily

You can perform a chi-square test in Python using scipy.stats.chi2_contingency. This function takes a contingency table (a 2D list or array) as input and returns the test statistic, p-value, degrees of freedom, and expected frequencies.
๐Ÿ“

Syntax

The chi-square test in Python is done using the chi2_contingency function from the scipy.stats module.

Its basic syntax is:

  • chi2_contingency(observed): where observed is a 2D array or list representing the contingency table of observed frequencies.
  • The function returns four values: chi2 (test statistic), p (p-value), dof (degrees of freedom), and expected (expected frequencies table).
python
from scipy.stats import chi2_contingency

# observed is a 2D list or array
chi2, p, dof, expected = chi2_contingency(observed)
๐Ÿ’ป

Example

This example shows how to perform a chi-square test on a 2x2 contingency table. It tests if two categorical variables are independent.

python
from scipy.stats import chi2_contingency

# Create a 2x2 contingency table
observed = [[10, 20],
            [20, 40]]

# Perform chi-square test
chi2, p, dof, expected = chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p:.4f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected)
Output
Chi-square statistic: 0.0000 p-value: 1.0000 Degrees of freedom: 1 Expected frequencies: [[10. 20.] [20. 40.]]
โš ๏ธ

Common Pitfalls

  • Wrong input format: The input must be a 2D list or array of observed counts, not raw data.
  • Small sample sizes: Chi-square test may not be valid if expected frequencies are too low (usually less than 5).
  • Misinterpreting p-value: A high p-value means no evidence to reject independence, not proof of independence.
python
from scipy.stats import chi2_contingency

# Wrong: passing raw data instead of contingency table
# raw_data = ["A", "A", "B", "B"]  # This is incorrect

# Right: create contingency table first
observed = [[10, 20], [20, 40]]
chi2, p, dof, expected = chi2_contingency(observed)
๐Ÿ“Š

Quick Reference

Remember these key points when using chi2_contingency:

StepDescription
Prepare dataCreate a 2D contingency table of observed counts
Call functionchi2_contingency(observed) returns test results
Check p-valueIf p < 0.05, variables are likely dependent
Check assumptionsExpected counts should be >= 5 for validity
โœ…

Key Takeaways

Use scipy.stats.chi2_contingency with a 2D observed frequency table to perform chi-square test.
The function returns chi-square statistic, p-value, degrees of freedom, and expected frequencies.
Ensure your input is a contingency table, not raw data.
Check that expected frequencies are not too low to trust the test results.
Interpret p-value carefully: low p-value suggests dependence, high p-value means no strong evidence against independence.