Classification vs Regression in Python: Key Differences and Examples
classification, the model predicts discrete labels or categories, while in regression, it predicts continuous numeric values. Both use different algorithms in sklearn and serve different problem types.Quick Comparison
Here is a quick side-by-side comparison of classification and regression tasks in machine learning.
| Aspect | Classification | Regression |
|---|---|---|
| Output Type | Discrete labels (e.g., 'spam', 'not spam') | Continuous values (e.g., price, temperature) |
| Goal | Assign input to a category | Predict a numeric value |
| Common Algorithms | Logistic Regression, Decision Trees, SVM | Linear Regression, Decision Trees, SVR |
| Evaluation Metrics | Accuracy, Precision, Recall | Mean Squared Error, R² Score |
| Example Use Case | Email spam detection | House price prediction |
Key Differences
Classification predicts which category or class an input belongs to. For example, deciding if an email is spam or not is classification because the output is a label. The model learns patterns to separate data into distinct groups.
Regression predicts a continuous number. For example, estimating house prices based on features like size and location is regression because the output is a number that can take any value. The model learns to fit a curve or line to the data points.
In sklearn, classification and regression use different algorithms and metrics. Classification uses metrics like accuracy and precision to measure how well the model assigns correct labels. Regression uses metrics like mean squared error to measure how close the predicted numbers are to actual values.
Code Comparison
Example of classification using sklearn to predict if a flower is Iris-setosa or not based on petal length.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Load data iris = load_iris() X = iris.data[:, 2:] # petal length and width y = (iris.target == 0).astype(int) # 1 if Iris-setosa else 0 # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train model clf = LogisticRegression() clf.fit(X_train, y_train) # Predict y_pred = clf.predict(X_test) # Evaluate accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}")
Regression Equivalent
Example of regression using sklearn to predict the petal length of an Iris flower based on sepal length and width.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Load data iris = load_iris() X = iris.data[:, :2] # sepal length and width y = iris.data[:, 2] # petal length # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train model reg = LinearRegression() reg.fit(X_train, y_train) # Predict y_pred = reg.predict(X_test) # Evaluate mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}")
When to Use Which
Choose classification when your goal is to assign inputs into categories or classes, such as detecting fraud, recognizing images, or filtering emails. Use regression when you need to predict continuous values like prices, temperatures, or sales amounts. The choice depends on whether your output is a label or a number.