What is describe() for statistics in Data Analysis Python?

Data Analysis Pythondata~5 mins

describe() for statistics in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

The describe() function quickly shows important summary numbers about your data. It helps you understand your data's main features without looking at every value.

You want to see the average, minimum, and maximum of a dataset fast.

You need to check how spread out your data values are.

You want to find out how many data points you have and if there are missing values.

You want a quick overview before making graphs or deeper analysis.

Syntax

Data Analysis Python

DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

describe() works on pandas DataFrames or Series.

You can choose which columns or data types to include or exclude.

Examples

Shows summary statistics for all numeric columns in the DataFrame df.

Data Analysis Python

df.describe()

Shows summary statistics for the single column age.

Data Analysis Python

df['age'].describe()

Shows summary statistics for all columns, including non-numeric ones.

Data Analysis Python

df.describe(include='all')

Sample Program

This code creates a small table of ages, heights, and weights. Then it uses describe() to get count, mean, std (spread), min, max, and quartiles for each column.

Data Analysis Python

import pandas as pd

# Create a simple DataFrame
data = {'age': [25, 30, 22, 40, 28],
        'height': [175, 180, 168, 190, 172],
        'weight': [70, 80, 60, 90, 65]}
df = pd.DataFrame(data)

# Use describe() to get summary statistics
summary = df.describe()
print(summary)

OutputSuccess

Important Notes

describe() ignores missing values by default.

For non-numeric data, describe() shows count, unique values, top (most common), and frequency.

Summary

describe() quickly summarizes your data's main statistics.

It works on numeric and non-numeric data with different outputs.

Use it first to understand your data before deeper analysis.