Overview - P-values and significance

What is it?

A p-value is a number that helps us decide if the results we see in data are likely due to chance or if they show a real effect. It is used in significance testing to measure how surprising the data is if there was actually no effect. A small p-value means the data is unlikely to happen by chance, so we say the result is significant. This helps us make decisions based on data rather than guesses.

Why it matters

Without p-values and significance testing, we would not have a clear way to tell if patterns in data are real or just random noise. This could lead to wrong conclusions in science, medicine, and business, causing wasted effort or harmful decisions. P-values give a common language to judge evidence and make data-driven choices more reliable and trustworthy.

Where it fits

Before learning p-values, you should understand basic statistics like averages, variability, and probability. After this, you can learn about confidence intervals, hypothesis testing in detail, and advanced topics like Bayesian statistics or machine learning model evaluation.

Mental Model

Core Idea

A p-value measures how surprising the observed data is if there was actually no real effect.

Think of it like...

Imagine tossing a coin 10 times and getting 9 heads. The p-value tells you how surprising this result is if the coin was fair. A very surprising result suggests the coin might be biased.

Hypothesis Testing Flow:

  ┌───────────────┐
  │  Start with   │
  │  Null Hypothesis (no effect) │
  └──────┬────────┘
         │
         ▼
  ┌───────────────┐
  │ Collect Data  │
  └──────┬────────┘
         │
         ▼
  ┌───────────────┐
  │ Calculate     │
  │ p-value       │
  └──────┬────────┘
         │
         ▼
  ┌───────────────┐
  │ Is p-value <  │
  │ significance  │
  │ level (e.g., 0.05)? │
  └──────┬────────┘
    Yes /    \ No
       /      \
      ▼        ▼
Reject Null  Fail to Reject
Hypothesis   Null Hypothesis

Build-Up - 6 Steps

1

FoundationUnderstanding the Null Hypothesis

Concept: The null hypothesis is the starting assumption that there is no real effect or difference.

In any test, we begin by assuming nothing special is happening. For example, if testing a new medicine, the null hypothesis says the medicine has no effect compared to a placebo. This assumption is what we try to challenge with data.

Result

You learn to frame questions in a way that assumes no effect first, which is essential for testing.

Understanding the null hypothesis sets the baseline for deciding if data shows something new or just random chance.

2

FoundationWhat is a P-value?

3

IntermediateChoosing a Significance Level

4

IntermediateInterpreting P-values Correctly

5

AdvancedLimitations and Misuse of P-values

6

ExpertAdvanced Alternatives and Bayesian Views

Under the Hood

P-values are calculated by assuming the null hypothesis is true and then finding the probability of observing data as extreme or more extreme than what was actually observed. This involves the sampling distribution of a test statistic under the null. The test statistic summarizes the data into a single number, such as a t-score or chi-square value. The p-value is the tail area of this distribution beyond the observed statistic.

Why designed this way?

The p-value concept was developed to provide a simple, standardized way to measure evidence against a null hypothesis without requiring full probability models for all hypotheses. It balances ease of use with mathematical rigor. Alternatives like Bayesian methods existed but were less practical historically due to computational limits.

Data Collection
    │
    ▼
Calculate Test Statistic (e.g., t, z)
    │
    ▼
Sampling Distribution under Null Hypothesis
    │
    ▼
Calculate p-value = Probability(data ≥ observed)
    │
    ▼
Compare p-value to Significance Level
    │
    ▼
Decision: Reject or Fail to Reject Null

Myth Busters - 4 Common Misconceptions

Quick: Does a p-value tell you the chance that the null hypothesis is true? Commit to yes or no.

Common Belief:A p-value gives the probability that the null hypothesis is true.

Tap to reveal reality

Quick: Does a p-value below 0.05 guarantee the effect is important? Commit to yes or no.

Common Belief:If the p-value is less than 0.05, the effect is large and important.

Tap to reveal reality

Quick: Does a p-value of 0.01 mean there is a 99% chance the alternative hypothesis is true? Commit to yes or no.

Common Belief:A p-value of 0.01 means there is a 99% chance the alternative hypothesis is true.

Tap to reveal reality

Quick: Does repeating tests many times without adjustment keep the false positive rate at 5%? Commit to yes or no.

Common Belief:Running many tests at 0.05 significance level keeps the false positive rate at 5% overall.

Tap to reveal reality

Expert Zone

1

P-values depend heavily on sample size; large samples can produce tiny p-values for negligible effects.

2

The choice of test statistic and assumptions (normality, independence) affect p-value validity.

3

P-values do not measure evidence strength directly; likelihood ratios or Bayesian factors can provide richer information.

When NOT to use

P-values are not suitable when prior knowledge is important or when multiple comparisons are uncorrected. Alternatives include Bayesian inference, confidence intervals, or false discovery rate control methods.

Production Patterns

In practice, p-values are combined with effect sizes, confidence intervals, and domain knowledge. Reporting standards require transparency about significance levels, multiple testing corrections, and assumptions. Automated pipelines often flag results by p-value thresholds but rely on expert review.

Connections

Bayesian Probability

Alternative approach to hypothesis testing

Understanding Bayesian methods helps grasp the limitations of p-values and offers a way to calculate direct probabilities of hypotheses.

Signal Detection Theory

Shares concepts of false positives and false negatives

Knowing signal detection clarifies the tradeoffs in setting significance levels and interpreting errors in hypothesis testing.

Legal Evidence Standards

Both assess strength of evidence to make decisions

Comparing p-values to legal standards like 'beyond reasonable doubt' reveals how different fields handle uncertainty and decision thresholds.

Common Pitfalls

#1Interpreting p-value as the probability that the null hypothesis is true.

Wrong approach:p_value = 0.03 print(f"There is a 3% chance the null hypothesis is true.")

Correct approach:p_value = 0.03 print(f"If the null hypothesis is true, there is a 3% chance of observing data this extreme.")

Root cause:Confusing the conditional probability of data given hypothesis with the probability of the hypothesis given data.

#2Using a fixed 0.05 cutoff without considering context or effect size.

Wrong approach:if p_value < 0.05: print("Result is important and meaningful.")

Correct approach:if p_value < 0.05: print("Result is statistically significant; check effect size and context before concluding importance.")

Root cause:Overreliance on arbitrary thresholds without integrating practical significance.

#3Ignoring multiple testing and reporting many p-values as significant.

Wrong approach:for test in multiple_tests: if test.p_value < 0.05: print("Significant")

Correct approach:from statsmodels.stats.multitest import multipletests adjusted = multipletests(p_values, alpha=0.05, method='bonferroni') for is_sig in adjusted[0]: print("Significant" if is_sig else "Not significant")

Root cause:Not accounting for increased false positive risk when performing many tests.

Key Takeaways

P-values measure how surprising the observed data is assuming no real effect exists, not the probability that hypotheses are true.

A small p-value suggests evidence against the null hypothesis but does not measure effect size or practical importance.

Significance levels set thresholds for decision-making but involve tradeoffs between false positives and false negatives.

Misinterpretations of p-values are common and can lead to wrong conclusions; understanding their meaning is crucial.

Advanced methods like Bayesian inference provide complementary ways to assess evidence beyond p-values.