How to Use Logistic Regression with sklearn in Python
Use
LogisticRegression from sklearn.linear_model by creating a model instance, fitting it with training data using fit(), and then predicting with predict(). This process trains the model to classify data based on input features.Syntax
The basic syntax to use logistic regression in sklearn involves importing the class, creating an instance, fitting the model to data, and making predictions.
LogisticRegression(): Creates the logistic regression model.fit(X_train, y_train): Trains the model on featuresX_trainand labelsy_train.predict(X_test): Predicts labels for new dataX_test.
python
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) predictions = model.predict(X_test)
Example
This example shows how to train a logistic regression model on a simple dataset and predict the class labels.
python
from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load dataset iris = load_iris() X = iris.data y = (iris.target == 0).astype(int) # Binary classification: class 0 vs others # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create and train model model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) # Predict predictions = model.predict(X_test) # Evaluate accuracy = accuracy_score(y_test, predictions) print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 1.00
Common Pitfalls
Common mistakes when using logistic regression in sklearn include:
- Not scaling features when needed, which can slow convergence.
- Using default
max_itertoo low, causing the model not to converge. - Confusing
predict()(class labels) withpredict_proba()(probabilities). - Trying to use logistic regression for multi-class without specifying the right solver or multi-class option.
python
from sklearn.linear_model import LogisticRegression # Wrong: default max_iter too low for some data model = LogisticRegression() model.fit(X_train, y_train) # May warn about convergence # Right: increase max_iter model = LogisticRegression(max_iter=200) model.fit(X_train, y_train)
Quick Reference
| Method/Parameter | Description |
|---|---|
| LogisticRegression() | Create logistic regression model instance |
| fit(X, y) | Train model on features X and labels y |
| predict(X) | Predict class labels for X |
| predict_proba(X) | Predict class probabilities for X |
| max_iter | Maximum iterations for solver to converge (default 100) |
| solver | Algorithm to use (e.g., 'lbfgs', 'liblinear') |
| multi_class | 'auto', 'ovr', or 'multinomial' for multi-class handling |
Key Takeaways
Import LogisticRegression from sklearn.linear_model to create the model.
Always fit the model with training data before predicting.
Increase max_iter if the model does not converge.
Use predict() for class labels and predict_proba() for probabilities.
Scale features if convergence is slow or data varies widely.