What is Handling missing values in ML Python?

ML Pythonprogramming~5 mins

Handling missing values in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Missing values can cause errors or wrong results in machine learning. Handling them helps models learn better and make good predictions.

When your dataset has empty or unknown entries in some columns.

When you want to prepare data before training a model.

When you want to avoid errors caused by missing data during analysis.

When you want to keep as much data as possible without losing rows.

When you want to fill missing data with reasonable guesses.

Syntax

ML Python

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

SimpleImputer replaces missing values with a chosen strategy like mean, median, or most frequent value.

You must fit the imputer on training data, then transform both training and test data.

Examples

Replace missing values with the mean of each column.

ML Python

imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

Replace missing values with the median of each column.

ML Python

imputer = SimpleImputer(strategy='median')
X_imputed = imputer.fit_transform(X)

Replace missing values with the most common value in each column.

ML Python

imputer = SimpleImputer(strategy='most_frequent')
X_imputed = imputer.fit_transform(X)

Replace missing values with zero.

ML Python

imputer = SimpleImputer(strategy='constant', fill_value=0)
X_imputed = imputer.fit_transform(X)

Sample Program

This program shows how to replace missing values with the mean of each column using SimpleImputer.

ML Python

import numpy as np
from sklearn.impute import SimpleImputer

# Sample data with missing values (np.nan)
X = np.array([[1, 2], [np.nan, 3], [7, np.nan], [np.nan, np.nan]])

# Create imputer to fill missing values with mean
imputer = SimpleImputer(strategy='mean')

# Fit imputer on data and transform
X_imputed = imputer.fit_transform(X)

print("Original data:\n", X)
print("\nData after imputing missing values with mean:\n", X_imputed)

OutputSuccess

Important Notes

Always fit the imputer only on training data to avoid data leakage.

Imputation strategies depend on the data type and distribution.

For categorical data, use 'most_frequent' or 'constant' strategies.

Summary

Missing values can cause problems in machine learning models.

SimpleImputer helps fill missing values with mean, median, or other strategies.

Always fit on training data and transform both training and test sets.