Given the sample data below, what is the 95% confidence interval for the mean using scipy.stats.t.interval?
import numpy as np from scipy import stats data = np.array([5, 7, 8, 6, 9, 10, 7, 6]) mean = np.mean(data) sem = stats.sem(data) conf_int = stats.t.interval(0.95, len(data)-1, loc=mean, scale=sem) print(conf_int)
Use stats.sem to get the standard error of the mean, then use stats.t.interval with degrees of freedom = sample size - 1.
The 95% confidence interval for the mean is calculated using the t-distribution because the sample size is small. The standard error of the mean (SEM) is computed, then stats.t.interval returns the interval around the sample mean.
After fitting a linear regression, you get the 95% confidence interval for the slope as (1.2, 3.4). What does this interval mean?
Confidence intervals estimate where the true parameter lies with a certain confidence level.
The confidence interval means we are 95% confident that the true slope parameter lies between 1.2 and 3.4. It does not describe data points or probabilities of exact values.
What error will this code raise?
import numpy as np from scipy import stats data = np.array([2, 4, 6, 8, 10]) mean = np.mean(data) sem = stats.sem(data) conf_int = stats.t.interval(0.95, len(data), loc=mean, scale=sem) print(conf_int)
import numpy as np from scipy import stats data = np.array([2, 4, 6, 8, 10]) mean = np.mean(data) sem = stats.sem(data) conf_int = stats.t.interval(0.95, len(data), loc=mean, scale=sem) print(conf_int)
Check the degrees of freedom parameter for stats.t.interval.
The degrees of freedom should be sample size minus 1. Using len(data) causes degrees of freedom to be 5, which is valid, so no error is raised. The error would occur if degrees of freedom were 0 or negative.
You have a sample variance of 4.0 from 10 observations. Using the chi-square distribution, what is the 95% confidence interval for the population variance?
import scipy.stats as stats n = 10 sample_var = 4.0 alpha = 0.05 lower = (n - 1) * sample_var / stats.chi2.ppf(1 - alpha/2, n - 1) upper = (n - 1) * sample_var / stats.chi2.ppf(alpha/2, n - 1) print((lower, upper))
Use chi-square percent point function (ppf) with degrees of freedom = n - 1.
The confidence interval for variance uses chi-square distribution quantiles. The formula is ((n-1)*s² / chi2_upper, (n-1)*s² / chi2_lower).
Why is the t-distribution preferred over the normal distribution when calculating confidence intervals for the mean with small sample sizes?
Think about what changes when the sample size is small and variance is estimated.
When the sample size is small, the population variance is unknown and must be estimated from the sample. This adds uncertainty, so the t-distribution, which accounts for this extra variability, is used instead of the normal distribution.