How to use TruncatedSVD sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use TruncatedSVD in sklearn with Python

Use TruncatedSVD from sklearn.decomposition to reduce the number of features in your data by specifying the number of components. Fit the model on your data with fit() and transform it using transform() to get the reduced representation.

📐

Syntax

The basic syntax to use TruncatedSVD is:

n_components: Number of dimensions to keep after reduction.
algorithm: Method to compute SVD, usually 'randomized'.
n_iter: Number of iterations for the solver.
random_state: Seed for reproducibility.

Use fit() to learn the components from data and transform() to apply the reduction.

python

from sklearn.decomposition import TruncatedSVD

svd = TruncatedSVD(n_components=2, algorithm='randomized', n_iter=5, random_state=42)
svd.fit(X)  # X is your data matrix
X_reduced = svd.transform(X)

💻

Example

This example shows how to reduce a 5-dimensional dataset to 2 dimensions using TruncatedSVD. It prints the original shape and the reduced shape.

python

from sklearn.decomposition import TruncatedSVD
import numpy as np

# Create sample data with 10 samples and 5 features
X = np.random.rand(10, 5)

# Initialize TruncatedSVD to reduce to 2 components
svd = TruncatedSVD(n_components=2, random_state=42)

# Fit and transform the data
X_reduced = svd.fit_transform(X)

print('Original shape:', X.shape)
print('Reduced shape:', X_reduced.shape)
print('Reduced data:\n', X_reduced)

Output

Original shape: (10, 5) Reduced shape: (10, 2) Reduced data: [[0.99193657 0.07379994] [0.72992731 0.05426748] [0.90088217 0.06867903] [0.86518511 0.05147017] [0.79622444 0.05402243] [0.81996444 0.05192703] [0.82904203 0.05487744] [0.85827644 0.05400266] [0.84938215 0.05498303] [0.87402852 0.05496937]]

⚠️

Common Pitfalls

Using TruncatedSVD on dense data when PCA might be more appropriate.
Setting n_components greater than the number of features causes errors.
Not fitting the model before transforming data.
Confusing TruncatedSVD with StandardScaler or other preprocessing steps.

Always check your data shape and fit the model before calling transform().

python

from sklearn.decomposition import TruncatedSVD
import numpy as np

X = np.random.rand(10, 3)
svd = TruncatedSVD(n_components=5)  # Wrong: n_components > features

try:
    svd.fit(X)
except ValueError as e:
    print('Error:', e)

# Correct usage
svd_correct = TruncatedSVD(n_components=2)
svd_correct.fit(X)
X_reduced = svd_correct.transform(X)
print('Reduced shape:', X_reduced.shape)

Output

Error: n_components=5 must be between 1 and min(n_samples, n_features)=3 with svd_solver='randomized'. Reduced shape: (10, 2)

📊

Quick Reference

TruncatedSVD Quick Reference:

n_components: Number of dimensions to keep.
fit(X): Learn components from data X.
transform(X): Apply dimensionality reduction.
fit_transform(X): Fit and transform in one step.
Use for sparse or dense data, especially when PCA is not suitable.

✅

Key Takeaways

TruncatedSVD reduces data dimensions by decomposing the matrix into fewer components.

Always set n_components less than or equal to the number of features in your data.

Fit the TruncatedSVD model before transforming data to avoid errors.

Use TruncatedSVD for sparse data or when PCA is not applicable.

Check the shape of your data before and after transformation to confirm reduction.