0
0
SciPydata~10 mins

Goodness of fit evaluation in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Goodness of fit evaluation
Collect observed data
Define expected distribution
Calculate test statistic
Compare statistic to distribution
Get p-value
Decide if fit is good or not
We start with observed data and an expected distribution, calculate a test statistic, then find a p-value to decide if the data fits well.
Execution Sample
SciPy
from scipy.stats import chisquare
observed = [16, 18, 16, 14, 12, 12]
expected = [15, 15, 15, 15, 15, 15]
stat, p = chisquare(f_obs=observed, f_exp=expected)
print(stat, p)
This code runs a chi-square goodness of fit test comparing observed counts to expected counts.
Execution Table
StepActionCalculationResult
1Calculate differences (observed - expected)[16-15, 18-15, 16-15, 14-15, 12-15, 12-15][1, 3, 1, -1, -3, -3]
2Square differences[1^2, 3^2, 1^2, (-1)^2, (-3)^2, (-3)^2][1, 9, 1, 1, 9, 9]
3Divide squared differences by expected[1/15, 9/15, 1/15, 1/15, 9/15, 9/15][0.0667, 0.6, 0.0667, 0.0667, 0.6, 0.6]
4Sum all values0.0667+0.6+0.0667+0.0667+0.6+0.62.0
5Calculate p-value from chi-square distribution with df=5p = 1 - CDF(2.0, df=5)p = 0.849
6Decisionp > 0.05 means fit is goodFail to reject null hypothesis
💡 Test ends after p-value calculation and decision step
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5Final
observed[16,18,16,14,12,12][16,18,16,14,12,12][16,18,16,14,12,12][16,18,16,14,12,12][16,18,16,14,12,12][16,18,16,14,12,12][16,18,16,14,12,12]
expected[15,15,15,15,15,15][15,15,15,15,15,15][15,15,15,15,15,15][15,15,15,15,15,15][15,15,15,15,15,15][15,15,15,15,15,15][15,15,15,15,15,15]
diffN/A[1,3,1,-1,-3,-3][1,3,1,-1,-3,-3][1,3,1,-1,-3,-3][1,3,1,-1,-3,-3][1,3,1,-1,-3,-3][1,3,1,-1,-3,-3]
squared_diffN/AN/A[1,9,1,1,9,9][1,9,1,1,9,9][1,9,1,1,9,9][1,9,1,1,9,9][1,9,1,1,9,9]
chi_componentsN/AN/AN/A[0.0667,0.6,0.0667,0.0667,0.6,0.6][0.0667,0.6,0.0667,0.0667,0.6,0.6][0.0667,0.6,0.0667,0.0667,0.6,0.6][0.0667,0.6,0.0667,0.0667,0.6,0.6]
statisticN/AN/AN/AN/A2.02.02.0
p_valueN/AN/AN/AN/AN/A0.8490.849
Key Moments - 3 Insights
Why do we square the differences between observed and expected counts?
Squaring makes all differences positive and emphasizes larger differences, as shown in step 2 of the execution_table.
What does a high p-value mean in this test?
A high p-value (like 0.849 in step 5) means the observed data fits the expected distribution well, so we do not reject the fit.
Why do we divide squared differences by expected counts?
Dividing by expected counts normalizes the differences, so categories with larger expected counts don't dominate, as in step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the sum of the chi-square components?
A2.0
B1.5
C3.0
D0.85
💡 Hint
Check the 'Sum all values' calculation in step 4 of the execution_table.
According to variable_tracker, what is the value of p_value after step 5?
A0.15
B0.849
C0.05
D2.0
💡 Hint
Look at the 'p_value' row under 'After Step 5' in variable_tracker.
If the observed counts were all equal to expected counts, what would the chi-square statistic be?
A6
B1
C0
D15
💡 Hint
When observed equals expected, differences are zero, so sum of squared differences is zero (see step 1 and 2).
Concept Snapshot
Goodness of fit test compares observed data to expected distribution.
Calculate chi-square statistic: sum((observed - expected)^2 / expected).
Find p-value from chi-square distribution with degrees of freedom = categories - 1.
High p-value means data fits well; low p-value means poor fit.
Use scipy.stats.chisquare for easy calculation.
Full Transcript
Goodness of fit evaluation checks if observed data matches an expected pattern. We start by collecting observed counts and defining expected counts. Then, we calculate the differences between observed and expected, square them, and divide by expected counts. Summing these gives the chi-square statistic. Using the chi-square distribution with the right degrees of freedom, we find the p-value. A high p-value means the observed data fits the expected distribution well. This process is shown step-by-step in the execution table and variable tracker. Key points include squaring differences to avoid negatives, normalizing by expected counts, and interpreting the p-value correctly.