0
0
Pandasdata~5 mins

Exploratory data analysis workflow in Pandas

Choose your learning style9 modes available
Introduction

Exploratory data analysis helps us understand data by looking at it in many ways. It shows patterns, problems, and important details before we do more work.

When you get a new dataset and want to know what it contains.
Before building a model to check data quality and find missing values.
To find interesting trends or unusual points in your data.
When you want to summarize data quickly with statistics and charts.
To decide how to clean or change data for better results.
Syntax
Pandas
import pandas as pd

df = pd.read_csv('file.csv')

# Step 1: Look at data
print(df.head())

# Step 2: Check data info
print(df.info())

# Step 3: Summary statistics
print(df.describe())

# Step 4: Check missing values
print(df.isnull().sum())

# Step 5: Visualize data (example)
df['column'].hist()

Use head() to see first rows and get a quick look.

info() shows data types and missing values.

Examples
Shows the first 10 rows of the data to get a bigger preview.
Pandas
df.head(10)
Gives summary statistics for all columns, including non-numeric ones.
Pandas
df.describe(include='all')
Counts how many missing values are in each column.
Pandas
df.isnull().sum()
Draws a histogram to see the distribution of the 'age' column.
Pandas
df['age'].hist()
Sample Program

This code creates a small table with some missing values. It then shows the first rows, data info, summary stats, missing values count, and draws a histogram for the age column.

Pandas
import pandas as pd
import matplotlib.pyplot as plt

# Create a small sample dataset
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'age': [25, 30, 35, None, 40],
    'salary': [50000, 60000, 70000, 80000, None]
}

df = pd.DataFrame(data)

# Step 1: Look at first rows
print('First rows:')
print(df.head())

# Step 2: Data info
print('\nData info:')
df.info()

# Step 3: Summary statistics
print('\nSummary statistics:')
print(df.describe())

# Step 4: Missing values
print('\nMissing values per column:')
print(df.isnull().sum())

# Step 5: Visualize age distribution
print('\nShowing histogram for age column...')
df['age'].hist()
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()
OutputSuccess
Important Notes

Always check for missing values early to decide how to handle them.

Use visualizations to see data shape and spot outliers easily.

Summary statistics help understand data spread and central values.

Summary

Exploratory data analysis helps you know your data well before using it.

Look at data samples, info, statistics, missing values, and charts step-by-step.

This process guides better decisions for cleaning and modeling data.