How to use PCA sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use PCA in sklearn with Python: Simple Guide

Use sklearn.decomposition.PCA to create a PCA object, then call fit or fit_transform on your data to reduce its dimensions. You can specify the number of components with n_components to control how many features to keep.

📐

Syntax

The basic syntax to use PCA in sklearn is:

PCA(n_components): Create a PCA object specifying how many components to keep.
fit(X): Learn the principal components from data X.
transform(X): Apply the dimensionality reduction to X.
fit_transform(X): Combine fit and transform in one step.

python

from sklearn.decomposition import PCA

pca = PCA(n_components=2)  # keep 2 principal components
pca.fit(X)                # learn components from data X
X_reduced = pca.transform(X)  # reduce dimensions of X

# Or simply:
X_reduced = pca.fit_transform(X)

💻

Example

This example shows how to reduce a 4-feature dataset to 2 principal components using PCA from sklearn.

python

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load sample data
data = load_iris()
X = data.data  # 150 samples, 4 features

# Create PCA object to keep 2 components
pca = PCA(n_components=2)

# Fit PCA and transform data
X_pca = pca.fit_transform(X)

# Print shape before and after
print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_pca.shape}")

# Print explained variance ratio
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")

Output

Original shape: (150, 4) Reduced shape: (150, 2) Explained variance ratio: [0.92461872 0.05306648]

⚠️

Common Pitfalls

Not scaling data: PCA works best when features are on similar scales. Use StandardScaler before PCA if needed.
Choosing wrong n_components: Too few components lose important info; too many keep noise.
Confusing fit and fit_transform: Use fit_transform to reduce data in one step.

python

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Wrong way: PCA without scaling
pca = PCA(n_components=2)
pca.fit(X)  # X not scaled

# Right way: scale first
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

📊

Quick Reference

Parameter	Description
n_components	Number of principal components to keep (int or float for variance ratio)
fit(X)	Compute principal components from data X
transform(X)	Apply dimensionality reduction to X
fit_transform(X)	Fit PCA and transform X in one step
explained_variance_ratio_	Percentage of variance explained by each component

✅

Key Takeaways

Always create a PCA object with the desired number of components using n_components.

Use fit_transform to both learn and apply PCA to your data in one step.

Scale your data before PCA to get meaningful results.

Check explained_variance_ratio_ to understand how much info your components keep.

Choosing the right number of components balances data simplification and information loss.