0
0
MlopsHow-ToBeginner · 3 min read

How to Use PCA in sklearn with Python: Simple Guide

Use sklearn.decomposition.PCA to create a PCA object, then call fit or fit_transform on your data to reduce its dimensions. You can specify the number of components with n_components to control how many features to keep.
📐

Syntax

The basic syntax to use PCA in sklearn is:

  • PCA(n_components): Create a PCA object specifying how many components to keep.
  • fit(X): Learn the principal components from data X.
  • transform(X): Apply the dimensionality reduction to X.
  • fit_transform(X): Combine fit and transform in one step.
python
from sklearn.decomposition import PCA

pca = PCA(n_components=2)  # keep 2 principal components
pca.fit(X)                # learn components from data X
X_reduced = pca.transform(X)  # reduce dimensions of X

# Or simply:
X_reduced = pca.fit_transform(X)
💻

Example

This example shows how to reduce a 4-feature dataset to 2 principal components using PCA from sklearn.

python
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load sample data
data = load_iris()
X = data.data  # 150 samples, 4 features

# Create PCA object to keep 2 components
pca = PCA(n_components=2)

# Fit PCA and transform data
X_pca = pca.fit_transform(X)

# Print shape before and after
print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_pca.shape}")

# Print explained variance ratio
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
Output
Original shape: (150, 4) Reduced shape: (150, 2) Explained variance ratio: [0.92461872 0.05306648]
⚠️

Common Pitfalls

  • Not scaling data: PCA works best when features are on similar scales. Use StandardScaler before PCA if needed.
  • Choosing wrong n_components: Too few components lose important info; too many keep noise.
  • Confusing fit and fit_transform: Use fit_transform to reduce data in one step.
python
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Wrong way: PCA without scaling
pca = PCA(n_components=2)
pca.fit(X)  # X not scaled

# Right way: scale first
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
📊

Quick Reference

ParameterDescription
n_componentsNumber of principal components to keep (int or float for variance ratio)
fit(X)Compute principal components from data X
transform(X)Apply dimensionality reduction to X
fit_transform(X)Fit PCA and transform X in one step
explained_variance_ratio_Percentage of variance explained by each component

Key Takeaways

Always create a PCA object with the desired number of components using n_components.
Use fit_transform to both learn and apply PCA to your data in one step.
Scale your data before PCA to get meaningful results.
Check explained_variance_ratio_ to understand how much info your components keep.
Choosing the right number of components balances data simplification and information loss.