The ROC curve helps us see how well a model can tell apart two groups, like sick vs healthy. AUC gives a simple number to show this ability.
0
0
ROC curve and AUC in ML Python
Introduction
When checking how good a model is at distinguishing between two classes, like spam vs not spam emails.
When comparing different models to pick the best one for classification tasks.
When you want to understand the trade-off between catching positives and avoiding false alarms.
When the classes are imbalanced, and accuracy alone is misleading.
When tuning model thresholds to find the best balance for your needs.
Syntax
ML Python
from sklearn.metrics import roc_curve, auc fpr, tpr, thresholds = roc_curve(true_labels, predicted_scores) roc_auc = auc(fpr, tpr)
true_labels are the actual class labels (0 or 1).
predicted_scores are the model's predicted probabilities or scores for the positive class.
Examples
Calculate ROC and AUC for a small example with 4 samples.
ML Python
fpr, tpr, thresholds = roc_curve([0, 0, 1, 1], [0.1, 0.4, 0.35, 0.8]) roc_auc = auc(fpr, tpr)
Plot the ROC curve with AUC shown in the legend.
ML Python
import matplotlib.pyplot as plt plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve') plt.legend() plt.show()
Sample Program
This program trains a simple model, calculates the ROC curve and AUC, then prints the false positive rates, true positive rates, and AUC value.
ML Python
from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, auc import numpy as np # Create a simple binary classification dataset X, y = make_classification(n_samples=100, n_features=5, random_state=42) # Split into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Train a logistic regression model model = LogisticRegression() model.fit(X_train, y_train) # Get predicted probabilities for the positive class y_scores = model.predict_proba(X_test)[:, 1] # Calculate ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_scores) # Calculate AUC roc_auc = auc(fpr, tpr) print(f"False Positive Rates: {np.round(fpr, 2)}") print(f"True Positive Rates: {np.round(tpr, 2)}") print(f"AUC: {roc_auc:.3f}")
OutputSuccess
Important Notes
The ROC curve plots the true positive rate against the false positive rate at different thresholds.
AUC ranges from 0 to 1; closer to 1 means better model performance.
ROC and AUC work well even if classes are imbalanced.
Summary
ROC curve shows how well a model separates classes at different thresholds.
AUC summarizes the ROC curve into one number to compare models easily.
Use ROC and AUC to evaluate and improve classification models.