Data Analysis Pythondata~5 mins

Exploratory Data Analysis (EDA) template in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Exploratory Data Analysis helps us understand data by summarizing and visualizing it. It shows patterns, errors, and important features before deeper analysis.

When you get a new dataset and want to know what it contains.

Before building a model to check data quality and relationships.

To find missing or strange values in your data.

To compare groups or categories in your data.

To decide which features are important for your analysis.

Syntax

Data Analysis Python

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
# df = pd.read_csv('file.csv')

# Basic info
print(df.info())
print(df.describe())

# Check missing values
print(df.isnull().sum())

# Visualize distributions
sns.histplot(df['column'])
plt.show()

# Visualize relationships
sns.scatterplot(x='col1', y='col2', data=df)
plt.show()

Use pandas for data handling and seaborn/matplotlib for plots.

Replace 'df' and column names with your actual data names.

Examples

Shows the first 5 rows to get a quick look at the data.

Data Analysis Python

print(df.head())

Gives summary statistics like mean, min, max for numeric columns.

Data Analysis Python

print(df.describe())

Shows distribution and outliers of 'value' for each 'category'.

Data Analysis Python

sns.boxplot(x='category', y='value', data=df)
plt.show()

Displays correlation between numeric columns to find relationships.

Data Analysis Python

sns.heatmap(df.corr(), annot=True)
plt.show()

Sample Program

This code loads a small dataset, shows basic info and stats, checks for missing data, and creates three plots: age distribution, salary by department, and correlation matrix.

Data Analysis Python

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample data
data = {
    'Age': [25, 30, 22, 40, 28, 35, 23, 31],
    'Salary': [50000, 60000, 45000, 80000, 52000, 75000, 48000, 62000],
    'Department': ['HR', 'IT', 'HR', 'IT', 'Finance', 'Finance', 'HR', 'IT']
}

# Create DataFrame
df = pd.DataFrame(data)

# Basic info
print(df.info())

# Summary statistics
print(df.describe())

# Check missing values
print(df.isnull().sum())

# Distribution of Age
sns.histplot(df['Age'], bins=5)
plt.title('Age Distribution')
plt.show()

# Boxplot Salary by Department
sns.boxplot(x='Department', y='Salary', data=df)
plt.title('Salary by Department')
plt.show()

# Correlation heatmap
sns.heatmap(df.corr(), annot=True)
plt.title('Correlation Matrix')
plt.show()

OutputSuccess

Important Notes

Always check for missing values before analysis.

Visualizations help spot patterns and outliers easily.

Use small samples first to understand data before big computations.

Summary

EDA helps you understand data quickly and clearly.

Use simple commands to get info, stats, and visuals.

Check data quality and relationships before deeper analysis.