Data-analysis-pythonHow-ToBeginner · 3 min read

How to Use describe() in pandas for Data Summary

Use the describe() method in pandas on a DataFrame or Series to get summary statistics like count, mean, min, max, and quartiles. It helps quickly understand the distribution and key metrics of your data columns.

📐

Syntax

The describe() method is called on a pandas DataFrame or Series. It returns summary statistics for numeric columns by default.

DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
percentiles: List of percentiles to include (default includes 25%, 50%, 75%).
include: Data types to include (e.g., 'all', 'object', 'number').
exclude: Data types to exclude.

python

df.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

💻

Example

This example shows how to use describe() on a DataFrame with numeric and categorical data. It returns count, mean, std, min, quartiles, and max for numeric columns.

python

import pandas as pd

data = {
    'age': [25, 30, 22, 40, 28],
    'salary': [50000, 60000, 45000, 80000, 52000],
    'department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)

summary = df.describe()
print(summary)

Output

age salary count 5.000000 5.000000 mean 29.000000 57400.000000 std 7.071068 13416.407864 min 22.000000 45000.000000 25% 25.000000 50000.000000 50% 28.000000 52000.000000 75% 30.000000 60000.000000 max 40.000000 80000.000000

⚠️

Common Pitfalls

By default, describe() only summarizes numeric columns. Non-numeric columns are ignored unless you specify include='all'.
Using include='all' may show different statistics for categorical data like count, unique, top, and freq.
Passing invalid data types to include or exclude raises errors.

python

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35]}
df = pd.DataFrame(data)

# Wrong: Only numeric columns summarized
print(df.describe())

# Right: Include all columns
print(df.describe(include='all'))

Output

age count 3.000000 mean 30.000000 std 5.000000 min 25.000000 25% 27.500000 50% 30.000000 75% 32.500000 max 35.000000 name age count 3 3.000000 unique 3 NaN top Alice NaN freq 1 NaN mean NaN 30.000000 std NaN 5.000000 min NaN 25.000000 25% NaN 27.500000 50% NaN 30.000000 75% NaN 32.500000 max NaN 35.000000

📊

Quick Reference

Parameter	Description	Default
percentiles	List of percentiles to include in output	[0.25, 0.5, 0.75]
include	Data types to include (e.g., 'all', 'number', 'object')	None (numeric only)
exclude	Data types to exclude	None
datetime_is_numeric	Treat datetime columns as numeric	False

✅

Key Takeaways

Use describe() to quickly get summary statistics of numeric data in pandas.

Add include='all' to summarize all columns including categorical data.

The output includes count, mean, std, min, quartiles, and max for numeric columns.

Be careful with include and exclude parameters to avoid errors.

It helps understand data distribution and detect anomalies fast.