Exploratory Data Analysis helps us understand data by summarizing and visualizing it. It shows patterns, errors, and important features before deeper analysis.
0
0
Exploratory Data Analysis (EDA) template in Data Analysis Python
Introduction
When you get a new dataset and want to know what it contains.
Before building a model to check data quality and relationships.
To find missing or strange values in your data.
To compare groups or categories in your data.
To decide which features are important for your analysis.
Syntax
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load data # df = pd.read_csv('file.csv') # Basic info print(df.info()) print(df.describe()) # Check missing values print(df.isnull().sum()) # Visualize distributions sns.histplot(df['column']) plt.show() # Visualize relationships sns.scatterplot(x='col1', y='col2', data=df) plt.show()
Use pandas for data handling and seaborn/matplotlib for plots.
Replace 'df' and column names with your actual data names.
Examples
Shows the first 5 rows to get a quick look at the data.
Data Analysis Python
print(df.head())Gives summary statistics like mean, min, max for numeric columns.
Data Analysis Python
print(df.describe())Shows distribution and outliers of 'value' for each 'category'.
Data Analysis Python
sns.boxplot(x='category', y='value', data=df) plt.show()
Displays correlation between numeric columns to find relationships.
Data Analysis Python
sns.heatmap(df.corr(), annot=True)
plt.show()Sample Program
This code loads a small dataset, shows basic info and stats, checks for missing data, and creates three plots: age distribution, salary by department, and correlation matrix.
Data Analysis Python
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Sample data data = { 'Age': [25, 30, 22, 40, 28, 35, 23, 31], 'Salary': [50000, 60000, 45000, 80000, 52000, 75000, 48000, 62000], 'Department': ['HR', 'IT', 'HR', 'IT', 'Finance', 'Finance', 'HR', 'IT'] } # Create DataFrame df = pd.DataFrame(data) # Basic info print(df.info()) # Summary statistics print(df.describe()) # Check missing values print(df.isnull().sum()) # Distribution of Age sns.histplot(df['Age'], bins=5) plt.title('Age Distribution') plt.show() # Boxplot Salary by Department sns.boxplot(x='Department', y='Salary', data=df) plt.title('Salary by Department') plt.show() # Correlation heatmap sns.heatmap(df.corr(), annot=True) plt.title('Correlation Matrix') plt.show()
OutputSuccess
Important Notes
Always check for missing values before analysis.
Visualizations help spot patterns and outliers easily.
Use small samples first to understand data before big computations.
Summary
EDA helps you understand data quickly and clearly.
Use simple commands to get info, stats, and visuals.
Check data quality and relationships before deeper analysis.