0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Use describe() in pandas for Data Summary

Use the describe() method in pandas on a DataFrame or Series to get summary statistics like count, mean, min, max, and quartiles. It helps quickly understand the distribution and key metrics of your data columns.
๐Ÿ“

Syntax

The describe() method is called on a pandas DataFrame or Series. It returns summary statistics for numeric columns by default.

  • DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
  • percentiles: List of percentiles to include (default includes 25%, 50%, 75%).
  • include: Data types to include (e.g., 'all', 'object', 'number').
  • exclude: Data types to exclude.
python
df.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
๐Ÿ’ป

Example

This example shows how to use describe() on a DataFrame with numeric and categorical data. It returns count, mean, std, min, quartiles, and max for numeric columns.

python
import pandas as pd

data = {
    'age': [25, 30, 22, 40, 28],
    'salary': [50000, 60000, 45000, 80000, 52000],
    'department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)

summary = df.describe()
print(summary)
Output
age salary count 5.000000 5.000000 mean 29.000000 57400.000000 std 7.071068 13416.407864 min 22.000000 45000.000000 25% 25.000000 50000.000000 50% 28.000000 52000.000000 75% 30.000000 60000.000000 max 40.000000 80000.000000
โš ๏ธ

Common Pitfalls

  • By default, describe() only summarizes numeric columns. Non-numeric columns are ignored unless you specify include='all'.
  • Using include='all' may show different statistics for categorical data like count, unique, top, and freq.
  • Passing invalid data types to include or exclude raises errors.
python
import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35]}
df = pd.DataFrame(data)

# Wrong: Only numeric columns summarized
print(df.describe())

# Right: Include all columns
print(df.describe(include='all'))
Output
age count 3.000000 mean 30.000000 std 5.000000 min 25.000000 25% 27.500000 50% 30.000000 75% 32.500000 max 35.000000 name age count 3 3.000000 unique 3 NaN top Alice NaN freq 1 NaN mean NaN 30.000000 std NaN 5.000000 min NaN 25.000000 25% NaN 27.500000 50% NaN 30.000000 75% NaN 32.500000 max NaN 35.000000
๐Ÿ“Š

Quick Reference

ParameterDescriptionDefault
percentilesList of percentiles to include in output[0.25, 0.5, 0.75]
includeData types to include (e.g., 'all', 'number', 'object')None (numeric only)
excludeData types to excludeNone
datetime_is_numericTreat datetime columns as numericFalse
โœ…

Key Takeaways

Use describe() to quickly get summary statistics of numeric data in pandas.
Add include='all' to summarize all columns including categorical data.
The output includes count, mean, std, min, quartiles, and max for numeric columns.
Be careful with include and exclude parameters to avoid errors.
It helps understand data distribution and detect anomalies fast.