Classification vs regression difference in python

MlopsComparisonBeginner · 4 min read

Classification vs Regression in Python: Key Differences and Examples

In classification, the model predicts discrete labels or categories, while in regression, it predicts continuous numeric values. Both use different algorithms in sklearn and serve different problem types.

⚖️

Quick Comparison

Here is a quick side-by-side comparison of classification and regression tasks in machine learning.

Aspect	Classification	Regression
Output Type	Discrete labels (e.g., 'spam', 'not spam')	Continuous values (e.g., price, temperature)
Goal	Assign input to a category	Predict a numeric value
Common Algorithms	Logistic Regression, Decision Trees, SVM	Linear Regression, Decision Trees, SVR
Evaluation Metrics	Accuracy, Precision, Recall	Mean Squared Error, R² Score
Example Use Case	Email spam detection	House price prediction

⚖️

Key Differences

Classification predicts which category or class an input belongs to. For example, deciding if an email is spam or not is classification because the output is a label. The model learns patterns to separate data into distinct groups.

Regression predicts a continuous number. For example, estimating house prices based on features like size and location is regression because the output is a number that can take any value. The model learns to fit a curve or line to the data points.

In sklearn, classification and regression use different algorithms and metrics. Classification uses metrics like accuracy and precision to measure how well the model assigns correct labels. Regression uses metrics like mean squared error to measure how close the predicted numbers are to actual values.

⚖️

Code Comparison

Example of classification using sklearn to predict if a flower is Iris-setosa or not based on petal length.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X = iris.data[:, 2:]  # petal length and width
y = (iris.target == 0).astype(int)  # 1 if Iris-setosa else 0

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train model
clf = LogisticRegression()
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Output

Accuracy: 1.00

↔️

Regression Equivalent

Example of regression using sklearn to predict the petal length of an Iris flower based on sepal length and width.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load data
iris = load_iris()
X = iris.data[:, :2]  # sepal length and width
y = iris.data[:, 2]   # petal length

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train model
reg = LinearRegression()
reg.fit(X_train, y_train)

# Predict
y_pred = reg.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

Output

Mean Squared Error: 0.04

🎯

When to Use Which

Choose classification when your goal is to assign inputs into categories or classes, such as detecting fraud, recognizing images, or filtering emails. Use regression when you need to predict continuous values like prices, temperatures, or sales amounts. The choice depends on whether your output is a label or a number.

✅

Key Takeaways

Classification predicts categories; regression predicts continuous numbers.

Use accuracy and precision to evaluate classification models.

Use mean squared error and R² score to evaluate regression models.

Choose classification for label prediction tasks and regression for numeric prediction tasks.

Both use different sklearn models suited to their output types.