What is Exploratory data analysis workflow in Pandas?

Pandasdata~5 mins

Exploratory data analysis workflow in Pandas

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Exploratory data analysis helps us understand data by looking at it in many ways. It shows patterns, problems, and important details before we do more work.

When you get a new dataset and want to know what it contains.

Before building a model to check data quality and find missing values.

To find interesting trends or unusual points in your data.

When you want to summarize data quickly with statistics and charts.

To decide how to clean or change data for better results.

Syntax

Pandas

import pandas as pd

df = pd.read_csv('file.csv')

# Step 1: Look at data
print(df.head())

# Step 2: Check data info
print(df.info())

# Step 3: Summary statistics
print(df.describe())

# Step 4: Check missing values
print(df.isnull().sum())

# Step 5: Visualize data (example)
df['column'].hist()

Use head() to see first rows and get a quick look.

info() shows data types and missing values.

Examples

Shows the first 10 rows of the data to get a bigger preview.

Pandas

df.head(10)

Gives summary statistics for all columns, including non-numeric ones.

Pandas

df.describe(include='all')

Counts how many missing values are in each column.

Pandas

df.isnull().sum()

Draws a histogram to see the distribution of the 'age' column.

Pandas

df['age'].hist()

Sample Program

This code creates a small table with some missing values. It then shows the first rows, data info, summary stats, missing values count, and draws a histogram for the age column.

Pandas

import pandas as pd
import matplotlib.pyplot as plt

# Create a small sample dataset
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'age': [25, 30, 35, None, 40],
    'salary': [50000, 60000, 70000, 80000, None]
}

df = pd.DataFrame(data)

# Step 1: Look at first rows
print('First rows:')
print(df.head())

# Step 2: Data info
print('\nData info:')
df.info()

# Step 3: Summary statistics
print('\nSummary statistics:')
print(df.describe())

# Step 4: Missing values
print('\nMissing values per column:')
print(df.isnull().sum())

# Step 5: Visualize age distribution
print('\nShowing histogram for age column...')
df['age'].hist()
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

OutputSuccess

Important Notes

Always check for missing values early to decide how to handle them.

Use visualizations to see data shape and spot outliers easily.

Summary statistics help understand data spread and central values.

Summary

Exploratory data analysis helps you know your data well before using it.

Look at data samples, info, statistics, missing values, and charts step-by-step.

This process guides better decisions for cleaning and modeling data.