SciPydata~5 mins

Probability density and cumulative functions in SciPy

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

We use probability density and cumulative functions to understand how likely values are in a range and how probabilities add up over that range.

To find how likely a specific value or range of values is in data.

To calculate the chance that a value is less than or equal to a certain number.

To visualize the shape of data distribution.

To compare different data distributions.

To make decisions based on probabilities in real-life events like weather or quality control.

Syntax

SciPy

from scipy.stats import distribution_name

# Probability Density Function (PDF)
pdf_value = distribution_name.pdf(x, *params)

# Cumulative Distribution Function (CDF)
cdf_value = distribution_name.cdf(x, *params)

distribution_name is the name of the distribution like norm (normal), expon (exponential), etc.

x is the value or array of values where you want to calculate the function.

Examples

This calculates the probability density at 0 for a normal distribution with mean 0 and standard deviation 1.

SciPy

from scipy.stats import norm

# PDF at x=0 for standard normal distribution
pdf_val = norm.pdf(0)

This calculates the probability that a value is less than or equal to 1 in the standard normal distribution.

SciPy

from scipy.stats import norm

# CDF at x=1 for standard normal distribution
cdf_val = norm.cdf(1)

This finds the density at 2 for an exponential distribution where the average rate is 1.

SciPy

from scipy.stats import expon

# PDF at x=2 for exponential distribution with scale=1
pdf_val = expon.pdf(2, scale=1)

Sample Program

This program calculates and prints the PDF and CDF values at 0 for a normal distribution. It also plots the PDF and CDF curves from -3 to 3 to show how probabilities are distributed and accumulated.

SciPy

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

# Values from -3 to 3
x = np.linspace(-3, 3, 100)

# Calculate PDF and CDF for standard normal distribution
pdf = norm.pdf(x)
cdf = norm.cdf(x)

# Print some values
print(f"PDF at 0: {norm.pdf(0):.4f}")
print(f"CDF at 0: {norm.cdf(0):.4f}")

# Plot PDF and CDF
plt.plot(x, pdf, label='PDF')
plt.plot(x, cdf, label='CDF')
plt.title('Normal Distribution PDF and CDF')
plt.xlabel('x')
plt.ylabel('Probability')
plt.legend()
plt.grid(True)
plt.show()

OutputSuccess

Important Notes

PDF values are not probabilities themselves but densities; they can be greater than 1.

CDF values always range from 0 to 1 and show cumulative probability up to x.

Use the same parameters for PDF and CDF to compare correctly.

Summary

PDF shows how dense or likely values are at specific points.

CDF shows the total probability up to a point.

Both help understand and work with data distributions.