What is Goodness of fit evaluation in SciPy?

SciPydata~5 mins

Goodness of fit evaluation in SciPy

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Goodness of fit evaluation helps us check if our data matches a specific expected pattern or distribution.

Checking if test scores follow a normal distribution.

Verifying if dice rolls are fair and uniform.

Testing if customer arrival times fit a Poisson distribution.

Comparing observed survey results to expected proportions.

Syntax

SciPy

from scipy.stats import chisquare

chisquare(f_obs, f_exp=None, ddof=0, axis=0)

f_obs is the observed frequency counts (your data).

f_exp is the expected frequency counts (if None, equal frequencies are assumed).

Examples

Test if observed counts fit a uniform distribution.

SciPy

from scipy.stats import chisquare

observed = [20, 30, 50]
result = chisquare(observed)
print(result)

Test if observed counts fit the given expected counts.

SciPy

from scipy.stats import chisquare

observed = [25, 35, 40]
expected = [30, 30, 40]
result = chisquare(observed, expected)
print(result)

Sample Program

This program tests if the candy colors are equally distributed using the chi-square test.

SciPy

from scipy.stats import chisquare

# Observed counts of colors in a bag of candies
observed_counts = [50, 30, 20]

# Expected counts if colors are equally likely
expected_counts = [100 / 3] * 3

# Perform chi-square goodness of fit test
result = chisquare(f_obs=observed_counts, f_exp=expected_counts)

print(f"Chi-square statistic: {result.statistic:.2f}")
print(f"P-value: {result.pvalue:.4f}")

OutputSuccess

Important Notes

A low p-value (usually below 0.05) means the observed data does not fit the expected distribution well.

The chi-square test requires that expected frequencies are not too small (usually at least 5).

Summary

Goodness of fit tests check how well data matches an expected pattern.

Use scipy.stats.chisquare to perform the chi-square test easily.

Interpret the p-value to decide if the fit is good or not.