0
0
ML Pythonprogramming~5 mins

Exploratory data analysis in ML Python

Choose your learning style9 modes available
Introduction

Exploratory data analysis helps us understand data by looking at it carefully. It shows patterns, problems, and important details before building models.

When you get a new dataset and want to know what it contains.
Before cleaning data to find missing or strange values.
To find relationships between different data points.
To decide which features are important for a model.
To check if data matches your expectations or needs fixing.
Syntax
ML Python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
data = pd.read_csv('data.csv')

# Show first rows
print(data.head())

# Summary statistics
print(data.describe())

# Check missing values
print(data.isnull().sum())

# Visualize data
sns.histplot(data['column'])
plt.show()

Use head() to see first few rows quickly.

describe() gives basic stats like mean and quartiles.

Examples
Shows the first 10 rows of the dataset to get a quick look.
ML Python
data.head(10)
Gives summary statistics for numeric columns like mean, min, max.
ML Python
data.describe()
Counts how many missing values are in each column.
ML Python
data.isnull().sum()
Draws a box plot to see data spread and outliers.
ML Python
sns.boxplot(x=data['column'])
plt.show()
Sample Program

This code creates a small dataset, shows its first rows, summary stats, missing values, and draws a histogram of ages.

ML Python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Create a small sample dataset
data = pd.DataFrame({
    'Age': [25, 30, 22, 40, 28, None, 35],
    'Salary': [50000, 60000, 45000, 80000, 52000, 58000, None],
    'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'HR', 'Sales']
})

# Show first rows
print('First rows:')
print(data.head())

# Summary statistics
print('\nSummary statistics:')
print(data.describe())

# Missing values
print('\nMissing values:')
print(data.isnull().sum())

# Visualize Age distribution
sns.histplot(data['Age'], kde=True)
plt.title('Age Distribution')
plt.show()
OutputSuccess
Important Notes

Always check for missing or strange values early.

Visualizations help see patterns that numbers alone may hide.

Use simple stats first before complex analysis.

Summary

Exploratory data analysis helps you understand your data clearly.

Look at data samples, summary stats, and missing values first.

Use charts to find patterns and problems easily.