0
0
MlopsConceptBeginner · 3 min read

What is Logistic Regression in Python with sklearn

Logistic regression in Python is a method to predict categories using the sklearn library. It models the probability of an event by fitting data to a logistic curve, making it useful for classification problems.
⚙️

How It Works

Logistic regression is like a smart decision maker that guesses yes or no answers based on input data. Imagine you want to decide if an email is spam or not. Logistic regression looks at features like words in the email and calculates the chance it is spam.

It uses a special curve called the logistic function that squeezes any number into a value between 0 and 1. This value can be seen as a probability. If the probability is above a certain point, it says "yes"; otherwise, "no." This way, it turns numbers into clear categories.

💻

Example

This example shows how to use logistic regression in Python with sklearn to classify simple data points.

python
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = (iris.target == 0).astype(int)  # Binary target: 1 if class 0, else 0

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train logistic regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 1.00
🎯

When to Use

Use logistic regression when you want to predict categories with two or more classes, especially when the output is yes/no or true/false. It works well when the relationship between input features and the outcome is roughly linear in terms of log-odds.

Common real-world uses include spam detection, medical diagnosis (e.g., disease or no disease), and customer churn prediction (will a customer leave or stay).

Key Points

  • Logistic regression predicts probabilities for classification tasks.
  • It uses a logistic function to map predictions between 0 and 1.
  • It is simple, fast, and interpretable.
  • Works best for linearly separable data.
  • Implemented easily in Python with sklearn.linear_model.LogisticRegression.

Key Takeaways

Logistic regression predicts category probabilities using a logistic curve.
It is ideal for binary classification problems like yes/no decisions.
Python's sklearn library provides a simple way to train and use logistic regression.
It works best when data can be separated by a line or plane in feature space.
Accuracy and interpretability make it a popular choice for many real-world tasks.