What is Logistic Regression in Python with sklearn
Logistic regression in Python is a method to predict categories using the sklearn library. It models the probability of an event by fitting data to a logistic curve, making it useful for classification problems.How It Works
Logistic regression is like a smart decision maker that guesses yes or no answers based on input data. Imagine you want to decide if an email is spam or not. Logistic regression looks at features like words in the email and calculates the chance it is spam.
It uses a special curve called the logistic function that squeezes any number into a value between 0 and 1. This value can be seen as a probability. If the probability is above a certain point, it says "yes"; otherwise, "no." This way, it turns numbers into clear categories.
Example
This example shows how to use logistic regression in Python with sklearn to classify simple data points.
from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris from sklearn.metrics import accuracy_score # Load iris dataset iris = load_iris() X = iris.data y = (iris.target == 0).astype(int) # Binary target: 1 if class 0, else 0 # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create and train logistic regression model model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) # Predict on test data predictions = model.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, predictions) print(f"Accuracy: {accuracy:.2f}")
When to Use
Use logistic regression when you want to predict categories with two or more classes, especially when the output is yes/no or true/false. It works well when the relationship between input features and the outcome is roughly linear in terms of log-odds.
Common real-world uses include spam detection, medical diagnosis (e.g., disease or no disease), and customer churn prediction (will a customer leave or stay).
Key Points
- Logistic regression predicts probabilities for classification tasks.
- It uses a logistic function to map predictions between 0 and 1.
- It is simple, fast, and interpretable.
- Works best for linearly separable data.
- Implemented easily in Python with
sklearn.linear_model.LogisticRegression.