What is Principal Component Analysis (PCA) in ML Python?

ML Pythonprogramming~5 mins

Principal Component Analysis (PCA) in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

PCA helps us simplify complex data by turning many features into fewer ones while keeping most important information.

When you have many features and want to reduce them to understand data better.

Before training a model to speed up learning and reduce noise.

To visualize high-dimensional data in 2D or 3D plots.

When you want to remove redundant or correlated features.

To compress data while keeping most of its meaning.

Syntax

ML Python

from sklearn.decomposition import PCA

pca = PCA(n_components=k)
X_reduced = pca.fit_transform(X)

n_components=k means you keep k main features after reduction.

fit_transform finds the main directions and applies the change to your data.

Examples

Keep 2 main features from the original data.

ML Python

pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

Keep enough features to explain 95% of the data variance.

ML Python

pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)

Keep all components and see how much each explains the data.

ML Python

pca = PCA()
X_reduced = pca.fit_transform(X)
print(pca.explained_variance_ratio_)

Sample Program

This code loads the Iris dataset, applies PCA to reduce from 4 features to 2, and prints the results.

ML Python

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load sample data
iris = load_iris()
X = iris.data

# Create PCA to keep 2 components
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Show shape before and after
print(f'Original shape: {X.shape}')
print(f'Reduced shape: {X_reduced.shape}')

# Show explained variance ratio
print('Explained variance ratio:', pca.explained_variance_ratio_)

# Show first 5 transformed samples
print('First 5 samples after PCA:')
print(X_reduced[:5])

OutputSuccess

Important Notes

PCA works best when features are scaled similarly; consider scaling before PCA.

The components are new features that are combinations of original ones.

Explained variance ratio shows how much information each component keeps.

Summary

PCA reduces many features into fewer main features while keeping important info.

Use PCA to simplify data, speed up models, or visualize high-dimensional data.

The explained variance ratio tells how much each new feature matters.