How to use logistic regression sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Logistic Regression with sklearn in Python

Use LogisticRegression from sklearn.linear_model by creating a model instance, fitting it with training data using fit(), and then predicting with predict(). This process trains the model to classify data based on input features.

📐

Syntax

The basic syntax to use logistic regression in sklearn involves importing the class, creating an instance, fitting the model to data, and making predictions.

LogisticRegression(): Creates the logistic regression model.
fit(X_train, y_train): Trains the model on features X_train and labels y_train.
predict(X_test): Predicts labels for new data X_test.

python

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

💻

Example

This example shows how to train a logistic regression model on a simple dataset and predict the class labels.

python

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = (iris.target == 0).astype(int)  # Binary classification: class 0 vs others

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")

Output

Accuracy: 1.00

⚠️

Common Pitfalls

Common mistakes when using logistic regression in sklearn include:

Not scaling features when needed, which can slow convergence.
Using default max_iter too low, causing the model not to converge.
Confusing predict() (class labels) with predict_proba() (probabilities).
Trying to use logistic regression for multi-class without specifying the right solver or multi-class option.

python

from sklearn.linear_model import LogisticRegression

# Wrong: default max_iter too low for some data
model = LogisticRegression()
model.fit(X_train, y_train)  # May warn about convergence

# Right: increase max_iter
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

📊

Quick Reference

Method/Parameter	Description
LogisticRegression()	Create logistic regression model instance
fit(X, y)	Train model on features X and labels y
predict(X)	Predict class labels for X
predict_proba(X)	Predict class probabilities for X
max_iter	Maximum iterations for solver to converge (default 100)
solver	Algorithm to use (e.g., 'lbfgs', 'liblinear')
multi_class	'auto', 'ovr', or 'multinomial' for multi-class handling

✅

Key Takeaways

Import LogisticRegression from sklearn.linear_model to create the model.

Always fit the model with training data before predicting.

Increase max_iter if the model does not converge.

Use predict() for class labels and predict_proba() for probabilities.

Scale features if convergence is slow or data varies widely.