How to Use LDA in sklearn with Python: Syntax and Example
Use
LinearDiscriminantAnalysis from sklearn.discriminant_analysis to create an LDA model in Python. Fit the model with fit(X, y) and predict with predict(X) to classify data.Syntax
The main steps to use LDA in sklearn are:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis: Import the LDA class.lda = LinearDiscriminantAnalysis(): Create an LDA model instance.lda.fit(X, y): Train the model with featuresXand labelsy.lda.predict(X_new): Predict labels for new dataX_new.
python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis # Create LDA model lda = LinearDiscriminantAnalysis() # Fit model with training data lda.fit(X, y) # Predict new data predictions = lda.predict(X_new)
Example
This example shows how to train an LDA model on the Iris dataset and predict the species of new samples.
python
from sklearn.datasets import load_iris from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load Iris dataset iris = load_iris() X, y = iris.data, iris.target # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create and train LDA model lda = LinearDiscriminantAnalysis() lda.fit(X_train, y_train) # Predict on test data y_pred = lda.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Test Accuracy: {accuracy:.2f}")
Output
Test Accuracy: 1.00
Common Pitfalls
- Not scaling data: LDA assumes normally distributed features but does not require scaling; however, extreme feature scales can affect results.
- Using LDA for regression: LDA is for classification, not regression tasks.
- Incorrect input shapes: Features
Xmust be 2D array and labelsy1D array. - Ignoring class imbalance: LDA can be sensitive to imbalanced classes, which may reduce accuracy.
python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis import numpy as np # Wrong: y is 2D array X = np.array([[1, 2], [3, 4], [5, 6]]) y_wrong = np.array([[0], [1], [0]]) # Should be 1D lda = LinearDiscriminantAnalysis() # This will raise an error try: lda.fit(X, y_wrong) except ValueError as e: print(f"Error: {e}") # Correct y shape y_correct = np.array([0, 1, 0]) lda.fit(X, y_correct) print("Model trained successfully with correct y shape.")
Output
Error: y should be a 1d array, got an array of shape (3, 1) instead.
Model trained successfully with correct y shape.
Quick Reference
Key points to remember when using LDA in sklearn:
- Import with
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis. - Create model:
lda = LinearDiscriminantAnalysis(). - Train with
lda.fit(X, y). - Predict with
lda.predict(X_new). - Use
accuracy_scoreto check performance.
Key Takeaways
Use LinearDiscriminantAnalysis from sklearn to perform classification with LDA.
Fit the model with 2D feature data and 1D label array using fit(X, y).
Predict new data labels with predict(X_new) after training.
Ensure labels are 1D arrays to avoid errors during fitting.
LDA works best for classification, not regression, and can be sensitive to class imbalance.