What is ROC curve and AUC in ML Python?

ML Pythonprogramming~5 mins

ROC curve and AUC in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

The ROC curve helps us see how well a model can tell apart two groups, like sick vs healthy. AUC gives a simple number to show this ability.

When checking how good a model is at distinguishing between two classes, like spam vs not spam emails.

When comparing different models to pick the best one for classification tasks.

When you want to understand the trade-off between catching positives and avoiding false alarms.

When the classes are imbalanced, and accuracy alone is misleading.

When tuning model thresholds to find the best balance for your needs.

Syntax

ML Python

from sklearn.metrics import roc_curve, auc

fpr, tpr, thresholds = roc_curve(true_labels, predicted_scores)
roc_auc = auc(fpr, tpr)

true_labels are the actual class labels (0 or 1).

predicted_scores are the model's predicted probabilities or scores for the positive class.

Examples

Calculate ROC and AUC for a small example with 4 samples.

ML Python

fpr, tpr, thresholds = roc_curve([0, 0, 1, 1], [0.1, 0.4, 0.35, 0.8])
roc_auc = auc(fpr, tpr)

Plot the ROC curve with AUC shown in the legend.

ML Python

import matplotlib.pyplot as plt
plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

Sample Program

This program trains a simple model, calculates the ROC curve and AUC, then prints the false positive rates, true positive rates, and AUC value.

ML Python

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc
import numpy as np

# Create a simple binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Get predicted probabilities for the positive class
y_scores = model.predict_proba(X_test)[:, 1]

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores)

# Calculate AUC
roc_auc = auc(fpr, tpr)

print(f"False Positive Rates: {np.round(fpr, 2)}")
print(f"True Positive Rates: {np.round(tpr, 2)}")
print(f"AUC: {roc_auc:.3f}")

OutputSuccess

Important Notes

The ROC curve plots the true positive rate against the false positive rate at different thresholds.

AUC ranges from 0 to 1; closer to 1 means better model performance.

ROC and AUC work well even if classes are imbalanced.

Summary

ROC curve shows how well a model separates classes at different thresholds.

AUC summarizes the ROC curve into one number to compare models easily.

Use ROC and AUC to evaluate and improve classification models.