Overview - Probability density and cumulative functions

What is it?

Probability density and cumulative functions describe how likely different outcomes are in a continuous random process. The probability density function (PDF) shows the relative likelihood of a value occurring at each point. The cumulative distribution function (CDF) shows the total probability of a value being less than or equal to a point. Together, they help us understand and work with continuous data in statistics.

Why it matters

Without these functions, we couldn't measure or predict how continuous data behaves, like heights, temperatures, or test scores. They let us calculate probabilities for ranges of values, which is essential for decision-making, risk assessment, and scientific analysis. Without them, we would only guess or rely on incomplete information.

Where it fits

Before learning this, you should understand basic probability and random variables. After this, you can explore statistical inference, hypothesis testing, and machine learning models that use probability distributions.

Mental Model

Core Idea

The PDF shows how dense the probability is at each point, while the CDF sums these probabilities up to a point to show total chance so far.

Think of it like...

Imagine pouring sand on a table shaped like a curve. The PDF is how thick the sand is at each spot, and the CDF is how much sand has piled up from the start to any spot.

Probability Functions

PDF (Probability Density Function):
  Height of curve at each x shows likelihood density

CDF (Cumulative Distribution Function):
  Increases from 0 to 1 as x moves right

  1 ┤                 ╭───────
    │                ╭╯
 0.5┤          ╭─────╯
    │         ╭╯
  0 ┼─────────╯────────────
     0       x

PDF curve is the slope of the CDF curve at each point.

Build-Up - 7 Steps

1

FoundationUnderstanding continuous random variables

Concept: Continuous random variables can take any value in a range, unlike discrete variables which have separate values.

A continuous random variable might be the height of people in a room. It can be 170.1 cm, 170.12 cm, or any value in between. We can't list all possible values because there are infinitely many.

Result

You understand that continuous variables need special tools to describe their probabilities because you can't count outcomes one by one.

Knowing the difference between continuous and discrete variables is key to choosing the right probability tools.

2

FoundationWhat is a probability density function (PDF)?

3

IntermediateWhat is a cumulative distribution function (CDF)?

4

IntermediateUsing scipy to compute PDF and CDF

5

IntermediateVisualizing PDF and CDF with scipy

6

AdvancedInterpreting PDF and CDF for decision making

7

ExpertHandling edge cases and numerical stability

Under the Hood

The PDF is the derivative of the CDF, meaning the CDF is the integral (area under the curve) of the PDF. Internally, scipy uses mathematical formulas or numerical integration to compute these functions efficiently. For standard distributions, closed-form formulas are used. For others, numerical methods approximate the values. The CDF accumulates probability from the left, ensuring it is always between 0 and 1 and non-decreasing.

Why designed this way?

Probability theory defines continuous distributions with PDFs and CDFs to handle infinite possible values. Using derivatives and integrals connects these functions mathematically. scipy implements these to provide fast, accurate, and reliable calculations for many distributions, avoiding manual integration and reducing errors.

Continuous Distribution Functions

  PDF (f(x))
  ┌─────────────┐
  │             │
  │   ╭───╮     │
  │  ╭╯   ╰╮    │
  │ ╭╯     ╰╮   │
  └─╯       ╰───┘

  CDF (F(x))
  ┌─────────────┐
  │             │
  │      ╭────╮ │
  │    ╭─╯    ╰─╮
  │  ╭─╯        ╰
  └─╯            

Relationship:
  F(x) = ∫ f(t) dt from -∞ to x
  f(x) = dF(x)/dx

Myth Busters - 3 Common Misconceptions

Quick: Do you think the PDF value at a point is the probability of that exact value? Commit to yes or no.

Common Belief:The PDF value at a point gives the probability that the variable equals that exact value.

Tap to reveal reality

Quick: Do you think the CDF can decrease at some points? Commit to yes or no.

Common Belief:The CDF can go down if the PDF is negative in some regions.

Tap to reveal reality

Quick: Do you think PDF values must always be less than or equal to 1? Commit to yes or no.

Common Belief:PDF values cannot be greater than 1 because probabilities are between 0 and 1.

Tap to reveal reality

Expert Zone

1

PDF values represent density, not probability, so comparing PDF heights across different distributions requires care.

2

Numerical computation of CDFs for extreme values uses special algorithms to avoid floating-point underflow or overflow.

3

Some distributions have no closed-form PDF or CDF, requiring numerical approximation methods that trade off speed and accuracy.

When NOT to use

PDF and CDF are not suitable for discrete variables; use probability mass functions (PMF) instead. For very complex or unknown distributions, consider kernel density estimation or empirical distribution functions.

Production Patterns

In real-world systems, PDFs and CDFs are used for risk modeling, anomaly detection, and probabilistic forecasting. Professionals often precompute CDF lookup tables for speed or use vectorized scipy functions for batch processing large datasets.

Connections

Integral calculus

PDF and CDF are connected through integration and differentiation.

Understanding integrals and derivatives helps grasp how probability accumulates and how densities relate to totals.

Signal processing

PDFs resemble signal amplitude distributions; CDFs relate to cumulative energy.

Techniques for analyzing signals can inspire methods for understanding probability distributions.

Economics - cumulative wealth distribution

CDFs model how wealth accumulates across a population, similar to probability accumulation.

Seeing CDFs in economics shows how cumulative functions describe real-world distributions beyond math.

Common Pitfalls

#1Confusing PDF value with probability at a point

Wrong approach:prob = norm.pdf(0) # Treating this as probability of exactly 0

Correct approach:prob = norm.cdf(0.0001) - norm.cdf(-0.0001) # Probability in a small range around 0

Root cause:Misunderstanding that continuous probabilities are areas, not point values.

#2Manually integrating PDF to get CDF without tools

Wrong approach:cdf = integrate.quad(norm.pdf, -np.inf, x) # Complex and error-prone

Correct approach:cdf = norm.cdf(x) # Use built-in scipy function

Root cause:Not knowing scipy provides direct CDF functions leads to unnecessary complexity.

#3Assuming PDF must be less than or equal to 1

Wrong approach:if norm.pdf(x) > 1: print('Invalid PDF') # Wrong rejection

Correct approach:print('PDF value:', norm.pdf(x)) # Accept values >1 for narrow distributions

Root cause:Confusing probability range with density scale.

Key Takeaways

Probability density functions (PDF) describe how probability is spread over continuous values as a density, not direct probabilities.

Cumulative distribution functions (CDF) accumulate probabilities up to a point, making it easy to find chances below thresholds.

scipy provides built-in, reliable functions to compute PDF and CDF for many distributions, avoiding manual integration.

PDF values can exceed 1 for narrow distributions; what matters is the total area under the PDF curve equals 1.

Understanding the difference between density and probability prevents common mistakes and deepens statistical intuition.