Data Analysis Pythondata~5 mins

Why statistics validates hypotheses in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Statistics helps us decide if an idea about data is likely true or just happened by chance.

Checking if a new medicine works better than the old one.

Deciding if a marketing campaign increased sales.

Testing if students learn more with a new teaching method.

Seeing if a machine part lasts longer after a design change.

Syntax

Data Analysis Python

1. State a hypothesis (an idea to test).
2. Collect data.
3. Use a statistical test to get a p-value.
4. Compare p-value to a threshold (like 0.05).
5. Decide if data supports the hypothesis or not.

The p-value is the probability of observing the data (or more extreme) if the null hypothesis is true.

A small p-value provides evidence against the null hypothesis.

Examples

This checks if 60 heads out of 100 flips is unusual for a fair coin.

Data Analysis Python

# Example: Test if coin is fair
# Hypothesis: Coin is fair (50% heads)
# Data: 60 heads in 100 flips

from scipy import stats

# Perform binomial test
res = stats.binomtest(60, n=100, p=0.5, alternative='two-sided')
p_value = res.pvalue
print(f"p-value: {p_value:.3f}")

This tests if the sample average height is different from 170 cm.

Data Analysis Python

# Example: Test if average height differs
# Hypothesis: Average height is 170 cm
# Data: Sample heights

import numpy as np
from scipy import stats

heights = np.array([168, 172, 171, 169, 173])

# Perform one-sample t-test
stat, p_value = stats.ttest_1samp(heights, 170)
print(f"p-value: {p_value:.3f}")

Sample Program

This program tests if the average score with the new method is higher than 75.

Data Analysis Python

import numpy as np
from scipy import stats

# Suppose we want to test if a new teaching method improves test scores
# Hypothesis: New method improves scores (mean > 75)

# Sample scores from students using new method
scores = np.array([78, 82, 85, 79, 90, 88, 76, 84])

# Perform one-sample t-test against 75
stat, p_value = stats.ttest_1samp(scores, 75)

# Since we want mean > 75, use one-sided p-value
p_value_one_sided = p_value / 2 if stat > 0 else 1 - p_value / 2

print(f"Test statistic: {stat:.3f}")
print(f"One-sided p-value: {p_value_one_sided:.3f}")

# Decide if new method improves scores at 0.05 significance
if p_value_one_sided < 0.05:
    print("We have evidence that the new method improves scores.")
else:
    print("We do not have enough evidence to say the new method improves scores.")

OutputSuccess

Important Notes

Always define your hypothesis clearly before testing.

Statistical tests give probabilities, not absolute truths.

Choosing the right test depends on your data and question.

Summary

Statistics helps check if data supports an idea or if results are by chance.

We use tests and p-values to make decisions about hypotheses.

Small p-values indicate stronger evidence against the null hypothesis.