0
0
MlopsHow-ToBeginner · 4 min read

How to Use LDA in sklearn with Python: Syntax and Example

Use LinearDiscriminantAnalysis from sklearn.discriminant_analysis to create an LDA model in Python. Fit the model with fit(X, y) and predict with predict(X) to classify data.
📐

Syntax

The main steps to use LDA in sklearn are:

  • from sklearn.discriminant_analysis import LinearDiscriminantAnalysis: Import the LDA class.
  • lda = LinearDiscriminantAnalysis(): Create an LDA model instance.
  • lda.fit(X, y): Train the model with features X and labels y.
  • lda.predict(X_new): Predict labels for new data X_new.
python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Create LDA model
lda = LinearDiscriminantAnalysis()

# Fit model with training data
lda.fit(X, y)

# Predict new data
predictions = lda.predict(X_new)
💻

Example

This example shows how to train an LDA model on the Iris dataset and predict the species of new samples.

python
from sklearn.datasets import load_iris
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train LDA model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

# Predict on test data
y_pred = lda.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy:.2f}")
Output
Test Accuracy: 1.00
⚠️

Common Pitfalls

  • Not scaling data: LDA assumes normally distributed features but does not require scaling; however, extreme feature scales can affect results.
  • Using LDA for regression: LDA is for classification, not regression tasks.
  • Incorrect input shapes: Features X must be 2D array and labels y 1D array.
  • Ignoring class imbalance: LDA can be sensitive to imbalanced classes, which may reduce accuracy.
python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
import numpy as np

# Wrong: y is 2D array
X = np.array([[1, 2], [3, 4], [5, 6]])
y_wrong = np.array([[0], [1], [0]])  # Should be 1D

lda = LinearDiscriminantAnalysis()

# This will raise an error
try:
    lda.fit(X, y_wrong)
except ValueError as e:
    print(f"Error: {e}")

# Correct y shape
y_correct = np.array([0, 1, 0])
lda.fit(X, y_correct)
print("Model trained successfully with correct y shape.")
Output
Error: y should be a 1d array, got an array of shape (3, 1) instead. Model trained successfully with correct y shape.
📊

Quick Reference

Key points to remember when using LDA in sklearn:

  • Import with from sklearn.discriminant_analysis import LinearDiscriminantAnalysis.
  • Create model: lda = LinearDiscriminantAnalysis().
  • Train with lda.fit(X, y).
  • Predict with lda.predict(X_new).
  • Use accuracy_score to check performance.

Key Takeaways

Use LinearDiscriminantAnalysis from sklearn to perform classification with LDA.
Fit the model with 2D feature data and 1D label array using fit(X, y).
Predict new data labels with predict(X_new) after training.
Ensure labels are 1D arrays to avoid errors during fitting.
LDA works best for classification, not regression, and can be sensitive to class imbalance.