How to Use describe() in pandas for Data Summary
Use the
describe() method in pandas on a DataFrame or Series to get summary statistics like count, mean, min, max, and quartiles. It helps quickly understand the distribution and key metrics of your data columns.Syntax
The describe() method is called on a pandas DataFrame or Series. It returns summary statistics for numeric columns by default.
DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)percentiles: List of percentiles to include (default includes 25%, 50%, 75%).include: Data types to include (e.g., 'all', 'object', 'number').exclude: Data types to exclude.
python
df.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
Example
This example shows how to use describe() on a DataFrame with numeric and categorical data. It returns count, mean, std, min, quartiles, and max for numeric columns.
python
import pandas as pd data = { 'age': [25, 30, 22, 40, 28], 'salary': [50000, 60000, 45000, 80000, 52000], 'department': ['HR', 'IT', 'IT', 'Finance', 'HR'] } df = pd.DataFrame(data) summary = df.describe() print(summary)
Output
age salary
count 5.000000 5.000000
mean 29.000000 57400.000000
std 7.071068 13416.407864
min 22.000000 45000.000000
25% 25.000000 50000.000000
50% 28.000000 52000.000000
75% 30.000000 60000.000000
max 40.000000 80000.000000
Common Pitfalls
- By default,
describe()only summarizes numeric columns. Non-numeric columns are ignored unless you specifyinclude='all'. - Using
include='all'may show different statistics for categorical data like count, unique, top, and freq. - Passing invalid data types to
includeorexcluderaises errors.
python
import pandas as pd data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35]} df = pd.DataFrame(data) # Wrong: Only numeric columns summarized print(df.describe()) # Right: Include all columns print(df.describe(include='all'))
Output
age
count 3.000000
mean 30.000000
std 5.000000
min 25.000000
25% 27.500000
50% 30.000000
75% 32.500000
max 35.000000
name age
count 3 3.000000
unique 3 NaN
top Alice NaN
freq 1 NaN
mean NaN 30.000000
std NaN 5.000000
min NaN 25.000000
25% NaN 27.500000
50% NaN 30.000000
75% NaN 32.500000
max NaN 35.000000
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| percentiles | List of percentiles to include in output | [0.25, 0.5, 0.75] |
| include | Data types to include (e.g., 'all', 'number', 'object') | None (numeric only) |
| exclude | Data types to exclude | None |
| datetime_is_numeric | Treat datetime columns as numeric | False |
Key Takeaways
Use
describe() to quickly get summary statistics of numeric data in pandas.Add
include='all' to summarize all columns including categorical data.The output includes count, mean, std, min, quartiles, and max for numeric columns.
Be careful with
include and exclude parameters to avoid errors.It helps understand data distribution and detect anomalies fast.