MlopsHow-ToBeginner · 3 min read

How to Use PCA in Python with sklearn: Simple Guide

Use sklearn.decomposition.PCA to perform Principal Component Analysis in Python. Fit the PCA model on your data with fit() or fit_transform() to reduce dimensions and extract principal components.

📐

Syntax

The basic syntax to use PCA in Python with sklearn is:

PCA(n_components): Create a PCA object specifying how many principal components to keep.
fit(X): Learn the principal components from data X.
transform(X): Apply the dimensionality reduction to X.
fit_transform(X): Fit PCA and transform X in one step.

python

from sklearn.decomposition import PCA

pca = PCA(n_components=2)  # keep 2 principal components
pca.fit(X)                # learn components from data X
X_reduced = pca.transform(X)  # reduce dimensions of X

# Or combine fit and transform
X_reduced = pca.fit_transform(X)

💻

Example

This example shows how to apply PCA to the Iris dataset to reduce its 4 features to 2 principal components and print the transformed data.

python

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load Iris dataset
iris = load_iris()
X = iris.data

# Create PCA object to reduce to 2 components
pca = PCA(n_components=2)

# Fit PCA and transform data
X_pca = pca.fit_transform(X)

# Print first 5 rows of transformed data
print(X_pca[:5])

Output

[[-2.68412563 0.31939725] [-2.71414169 -0.17700123] [-2.88899057 -0.14494943] [-2.74534286 -0.31829898] [-2.72871654 0.32675451]]

⚠️

Common Pitfalls

Common mistakes when using PCA include:

Not scaling data before PCA, which can cause features with larger scales to dominate.
Choosing too many or too few components without checking explained variance.
Applying PCA on categorical data without encoding.

Always scale your data (e.g., with StandardScaler) before PCA for best results.

python

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Wrong way: PCA without scaling
pca = PCA(n_components=2)
X_pca_wrong = pca.fit_transform(X)  # May give misleading results

# Right way: scale then PCA
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca_right = pca.fit_transform(X_scaled)

print('Explained variance ratio:', pca.explained_variance_ratio_)

Output

Explained variance ratio: [0.72770452 0.23030523]

📊

Quick Reference

Method	Description
PCA(n_components)	Create PCA model with number of components to keep
fit(X)	Compute principal components from data X
transform(X)	Apply dimensionality reduction to X
fit_transform(X)	Fit PCA and transform X in one step
explained_variance_ratio_	Percentage of variance explained by each component

✅

Key Takeaways

Always scale your data before applying PCA for accurate results.

Use PCA to reduce data dimensions by keeping the most important components.

Check explained variance ratio to decide how many components to keep.

Use fit_transform() to fit PCA and reduce data in one step.

PCA works only on numerical data, so encode categorical features first.