How to Calculate Descriptive Statistics in Python Easily
You can calculate descriptive statistics in Python using built-in functions like
mean(), median(), and mode() from the statistics module or use pandas library's describe() method for a quick summary. These tools help you find average, middle value, most common value, spread, and more from your data.Syntax
Python provides two main ways to calculate descriptive statistics:
- statistics module: Use functions like
mean(data),median(data),mode(data),variance(data), andstdev(data). - pandas library: Use
DataFrame.describe()to get count, mean, std, min, quartiles, and max in one call.
Here, data is a list or array of numbers.
python
import statistics data = [10, 20, 20, 40, 50, 50, 50, 60, 70] mean_value = statistics.mean(data) median_value = statistics.median(data) mode_value = statistics.mode(data) variance_value = statistics.variance(data) std_dev_value = statistics.stdev(data) # Using pandas import pandas as pd df = pd.DataFrame(data, columns=['values']) description = df.describe()
Example
This example shows how to calculate mean, median, mode, variance, and standard deviation using the statistics module, and how to get a summary using pandas.
python
import statistics import pandas as pd data = [10, 20, 20, 40, 50, 50, 50, 60, 70] mean_value = statistics.mean(data) median_value = statistics.median(data) mode_value = statistics.mode(data) variance_value = statistics.variance(data) std_dev_value = statistics.stdev(data) print(f"Mean: {mean_value}") print(f"Median: {median_value}") print(f"Mode: {mode_value}") print(f"Variance: {variance_value}") print(f"Standard Deviation: {std_dev_value}") # Using pandas df = pd.DataFrame(data, columns=['values']) print("\nPandas describe():") print(df.describe())
Output
Mean: 40.0
Median: 50
Mode: 50
Variance: 400.0
Standard Deviation: 20.0
Pandas describe():
values
count 9.000000
mean 40.000000
std 20.000000
min 10.000000
25% 20.000000
50% 50.000000
75% 50.000000
max 70.000000
Common Pitfalls
Common mistakes when calculating descriptive statistics in Python include:
- Using
mode()on data with multiple modes causes an error; it only returns one mode. - Passing empty or non-numeric data to statistics functions raises exceptions.
- For large datasets,
statisticsmodule can be slower thanpandas. - Not importing required modules before use.
Always check your data type and handle exceptions if needed.
python
import statistics data = [1, 2, 2, 3, 3] # This will raise StatisticsError because there are multiple modes try: mode_value = statistics.mode(data) except statistics.StatisticsError: mode_value = "No unique mode" print(f"Mode: {mode_value}")
Output
Mode: No unique mode
Quick Reference
| Function | Description |
|---|---|
| statistics.mean(data) | Calculates the average of numbers |
| statistics.median(data) | Finds the middle value |
| statistics.mode(data) | Finds the most common value |
| statistics.variance(data) | Measures spread of data |
| statistics.stdev(data) | Standard deviation, spread around mean |
| pandas.DataFrame.describe() | Summary stats: count, mean, std, min, quartiles, max |
Key Takeaways
Use Python's statistics module for simple descriptive statistics on small datasets.
Use pandas DataFrame.describe() for a quick and comprehensive summary of numeric data.
Handle exceptions when using mode() if your data has multiple modes.
Ensure your data is numeric and non-empty before calculating statistics.
Pandas is more efficient for large datasets and offers more descriptive stats at once.