What is the output of the following code that performs a chi-square goodness of fit test?
from scipy.stats import chisquare observed = [20, 30, 50] expected = [25, 25, 50] stat, p = chisquare(f_obs=observed, f_exp=expected) print(round(stat, 2), round(p, 3))
Recall the chi-square formula: sum((observed - expected)^2 / expected).
Chi-square statistic = \sum \frac{(O-E)^2}{E} = \frac{(20-25)^2}{25} + \frac{(30-25)^2}{25} + \frac{(50-50)^2}{50} = 1 + 1 + 0 = 2.0
df = 3-1 = 2, p-value = \chi^2.sf(2, 2) \approx 0.3679, round(p, 3) = 0.368
Given the following code testing if data fits a normal distribution, what is the output?
from scipy.stats import kstest, norm import numpy as np data = np.array([1.2, 2.3, 2.9, 3.1, 4.0]) stat, p = kstest(data, 'norm', args=(2.5, 1)) print(round(stat, 3), round(p, 3))
Remember the KS test compares empirical and theoretical CDFs.
The KS statistic measures the largest difference between the data's CDF and the normal CDF with mean=2.5 and std=1.
Running the code shows stat approx 0.317 and p-value approx 0.123.
What error does the following code produce?
from scipy.stats import chisquare observed = [10, 20, 30] expected = [15, 15] stat, p = chisquare(f_obs=observed, f_exp=expected) print(stat, p)
Check if observed and expected lists have the same length.
The observed and expected arrays have different lengths (3 vs 2), so numpy broadcasting fails causing a ValueError.
You have a dataset of 1000 values and want to check if it follows a uniform distribution. Which scipy test is most appropriate?
Think about tests that compare sample distribution to a continuous theoretical distribution.
The Kolmogorov-Smirnov test (kstest) compares the sample distribution to a continuous theoretical distribution like uniform.
Chi-square requires binning and frequencies, less precise for continuous data.
ttest_1samp tests means, not distribution shape.
pearsonr tests correlation, not goodness of fit.
Which statement best describes the meaning of a p-value of 0.03 in a goodness of fit test?
Recall the definition of p-value in hypothesis testing.
The p-value is the probability of observing data as extreme or more extreme than the observed, assuming the null hypothesis is true.
It is NOT the probability the null hypothesis is true or the data fits perfectly.