How to Use TruncatedSVD in sklearn with Python
Use
TruncatedSVD from sklearn.decomposition to reduce the number of features in your data by specifying the number of components. Fit the model on your data with fit() and transform it using transform() to get the reduced representation.Syntax
The basic syntax to use TruncatedSVD is:
n_components: Number of dimensions to keep after reduction.algorithm: Method to compute SVD, usually 'randomized'.n_iter: Number of iterations for the solver.random_state: Seed for reproducibility.
Use fit() to learn the components from data and transform() to apply the reduction.
python
from sklearn.decomposition import TruncatedSVD svd = TruncatedSVD(n_components=2, algorithm='randomized', n_iter=5, random_state=42) svd.fit(X) # X is your data matrix X_reduced = svd.transform(X)
Example
This example shows how to reduce a 5-dimensional dataset to 2 dimensions using TruncatedSVD. It prints the original shape and the reduced shape.
python
from sklearn.decomposition import TruncatedSVD import numpy as np # Create sample data with 10 samples and 5 features X = np.random.rand(10, 5) # Initialize TruncatedSVD to reduce to 2 components svd = TruncatedSVD(n_components=2, random_state=42) # Fit and transform the data X_reduced = svd.fit_transform(X) print('Original shape:', X.shape) print('Reduced shape:', X_reduced.shape) print('Reduced data:\n', X_reduced)
Output
Original shape: (10, 5)
Reduced shape: (10, 2)
Reduced data:
[[0.99193657 0.07379994]
[0.72992731 0.05426748]
[0.90088217 0.06867903]
[0.86518511 0.05147017]
[0.79622444 0.05402243]
[0.81996444 0.05192703]
[0.82904203 0.05487744]
[0.85827644 0.05400266]
[0.84938215 0.05498303]
[0.87402852 0.05496937]]
Common Pitfalls
- Using
TruncatedSVDon dense data whenPCAmight be more appropriate. - Setting
n_componentsgreater than the number of features causes errors. - Not fitting the model before transforming data.
- Confusing
TruncatedSVDwithStandardScaleror other preprocessing steps.
Always check your data shape and fit the model before calling transform().
python
from sklearn.decomposition import TruncatedSVD import numpy as np X = np.random.rand(10, 3) svd = TruncatedSVD(n_components=5) # Wrong: n_components > features try: svd.fit(X) except ValueError as e: print('Error:', e) # Correct usage svd_correct = TruncatedSVD(n_components=2) svd_correct.fit(X) X_reduced = svd_correct.transform(X) print('Reduced shape:', X_reduced.shape)
Output
Error: n_components=5 must be between 1 and min(n_samples, n_features)=3 with svd_solver='randomized'.
Reduced shape: (10, 2)
Quick Reference
TruncatedSVD Quick Reference:
n_components: Number of dimensions to keep.fit(X): Learn components from dataX.transform(X): Apply dimensionality reduction.fit_transform(X): Fit and transform in one step.- Use for sparse or dense data, especially when PCA is not suitable.
Key Takeaways
TruncatedSVD reduces data dimensions by decomposing the matrix into fewer components.
Always set n_components less than or equal to the number of features in your data.
Fit the TruncatedSVD model before transforming data to avoid errors.
Use TruncatedSVD for sparse data or when PCA is not applicable.
Check the shape of your data before and after transformation to confirm reduction.