Exploratory data analysis helps us understand data by looking at it in many ways. It shows patterns, problems, and important details before we do more work.
0
0
Exploratory data analysis workflow in Pandas
Introduction
When you get a new dataset and want to know what it contains.
Before building a model to check data quality and find missing values.
To find interesting trends or unusual points in your data.
When you want to summarize data quickly with statistics and charts.
To decide how to clean or change data for better results.
Syntax
Pandas
import pandas as pd df = pd.read_csv('file.csv') # Step 1: Look at data print(df.head()) # Step 2: Check data info print(df.info()) # Step 3: Summary statistics print(df.describe()) # Step 4: Check missing values print(df.isnull().sum()) # Step 5: Visualize data (example) df['column'].hist()
Use head() to see first rows and get a quick look.
info() shows data types and missing values.
Examples
Shows the first 10 rows of the data to get a bigger preview.
Pandas
df.head(10)Gives summary statistics for all columns, including non-numeric ones.
Pandas
df.describe(include='all')Counts how many missing values are in each column.
Pandas
df.isnull().sum()Draws a histogram to see the distribution of the 'age' column.
Pandas
df['age'].hist()Sample Program
This code creates a small table with some missing values. It then shows the first rows, data info, summary stats, missing values count, and draws a histogram for the age column.
Pandas
import pandas as pd import matplotlib.pyplot as plt # Create a small sample dataset data = { 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'], 'age': [25, 30, 35, None, 40], 'salary': [50000, 60000, 70000, 80000, None] } df = pd.DataFrame(data) # Step 1: Look at first rows print('First rows:') print(df.head()) # Step 2: Data info print('\nData info:') df.info() # Step 3: Summary statistics print('\nSummary statistics:') print(df.describe()) # Step 4: Missing values print('\nMissing values per column:') print(df.isnull().sum()) # Step 5: Visualize age distribution print('\nShowing histogram for age column...') df['age'].hist() plt.title('Age Distribution') plt.xlabel('Age') plt.ylabel('Count') plt.show()
OutputSuccess
Important Notes
Always check for missing values early to decide how to handle them.
Use visualizations to see data shape and spot outliers easily.
Summary statistics help understand data spread and central values.
Summary
Exploratory data analysis helps you know your data well before using it.
Look at data samples, info, statistics, missing values, and charts step-by-step.
This process guides better decisions for cleaning and modeling data.