0
0
Data Analysis Pythondata~5 mins

P-values and significance in Data Analysis Python

Choose your learning style9 modes available
Introduction

P-values help us decide if a result is likely due to chance or if it shows a real effect.

Checking if a new medicine works better than an old one.
Testing if a marketing campaign increased sales.
Seeing if students' test scores improved after extra tutoring.
Determining if a coin is fair or biased.
Syntax
Data Analysis Python
from scipy import stats

# Calculate p-value from test statistic and degrees of freedom
p_value = stats.t.sf(abs(t_stat), df)*2

# Or directly from data samples
stat, p_value = stats.ttest_ind(sample1, sample2)

P-value is a number between 0 and 1.

Lower p-values mean stronger evidence against chance.

Examples
Tests if sample mean is different from 6.
Data Analysis Python
from scipy import stats

# Example 1: One sample t-test
sample = [5, 6, 7, 8, 9]
stat, p_value = stats.ttest_1samp(sample, 6)
print(f"p-value: {p_value:.3f}")
Tests if two samples have different means.
Data Analysis Python
from scipy import stats

# Example 2: Two sample t-test
sample1 = [5, 6, 7, 8, 9]
sample2 = [7, 8, 9, 10, 11]
stat, p_value = stats.ttest_ind(sample1, sample2)
print(f"p-value: {p_value:.3f}")
Sample Program

This code compares two groups' exam scores to check if they differ significantly.

Data Analysis Python
from scipy import stats

# Two groups of exam scores
group_A = [85, 87, 90, 88, 86]
group_B = [80, 82, 79, 81, 83]

# Perform t-test to see if scores differ
stat, p_value = stats.ttest_ind(group_A, group_B)

print(f"Test statistic: {stat:.3f}")
print(f"P-value: {p_value:.3f}")

# Decide significance at 0.05 level
if p_value < 0.05:
    print("Result is significant: groups differ.")
else:
    print("Result is not significant: no clear difference.")
OutputSuccess
Important Notes

A common cutoff for significance is 0.05, meaning 5% chance results are random.

P-values do not measure size of effect, only evidence against chance.

Always consider context and data quality, not just p-value.

Summary

P-values help decide if results are likely real or by chance.

Lower p-values mean stronger evidence against randomness.

Use p-values with a significance level like 0.05 to make decisions.