Bird
Raised Fist0
ML Pythonml~20 mins

Probability calibration in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Probability calibration
Problem:You have a classification model that predicts probabilities for classes, but these probabilities are not well calibrated. For example, when the model predicts 0.8 probability, the true class only occurs 60% of the time. This means the model is overconfident and its probability outputs are misleading.
Current Metrics:Brier score loss: 0.18, Accuracy: 85%, Calibration curve shows predicted probabilities are systematically higher than true frequencies.
Issue:The model's predicted probabilities are not reliable. This can cause problems in decision-making processes that depend on accurate probability estimates.
Your Task
Improve the probability calibration of the model so that predicted probabilities better reflect true likelihoods, aiming to reduce Brier score loss below 0.12 without reducing accuracy below 80%.
Do not change the model architecture or retrain the original classifier.
Use calibration techniques on the existing model's predicted probabilities.
Use only scikit-learn calibration methods.
Hint 1
Hint 2
Hint 3
Solution
ML Python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.metrics import brier_score_loss, accuracy_score
import matplotlib.pyplot as plt

# Create synthetic data
X, y = make_classification(n_samples=10000, n_features=20, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train original model
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)

# Predict probabilities before calibration
probs_uncalibrated = model.predict_proba(X_test)[:, 1]

# Evaluate before calibration
brier_uncalibrated = brier_score_loss(y_test, probs_uncalibrated)
acc_uncalibrated = accuracy_score(y_test, model.predict(X_test))

# Calibrate model using sigmoid (Platt scaling)
calibrated_sigmoid = CalibratedClassifierCV(model, method='sigmoid', cv='prefit')
calibrated_sigmoid.fit(X_train, y_train)
probs_calibrated_sigmoid = calibrated_sigmoid.predict_proba(X_test)[:, 1]
brier_sigmoid = brier_score_loss(y_test, probs_calibrated_sigmoid)
acc_sigmoid = accuracy_score(y_test, calibrated_sigmoid.predict(X_test))

# Calibrate model using isotonic regression
calibrated_isotonic = CalibratedClassifierCV(model, method='isotonic', cv='prefit')
calibrated_isotonic.fit(X_train, y_train)
probs_calibrated_isotonic = calibrated_isotonic.predict_proba(X_test)[:, 1]
brier_isotonic = brier_score_loss(y_test, probs_calibrated_isotonic)
acc_isotonic = accuracy_score(y_test, calibrated_isotonic.predict(X_test))

# Plot calibration curves
plt.figure(figsize=(8, 6))
for probs, label in [(probs_uncalibrated, 'Uncalibrated'), (probs_calibrated_sigmoid, 'Sigmoid'), (probs_calibrated_isotonic, 'Isotonic')]:
    fraction_of_positives, mean_predicted_value = calibration_curve(y_test, probs, n_bins=10)
    plt.plot(mean_predicted_value, fraction_of_positives, marker='o', label=label)

plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('Mean predicted probability')
plt.ylabel('Fraction of positives')
plt.title('Calibration Curves')
plt.legend()
plt.grid(True)
plt.show()

# Print metrics
print(f'Before calibration: Brier score loss = {brier_uncalibrated:.3f}, Accuracy = {acc_uncalibrated:.3f}')
print(f'Sigmoid calibration: Brier score loss = {brier_sigmoid:.3f}, Accuracy = {acc_sigmoid:.3f}')
print(f'Isotonic calibration: Brier score loss = {brier_isotonic:.3f}, Accuracy = {acc_isotonic:.3f}')
Applied probability calibration using Platt scaling (sigmoid) and isotonic regression on the existing trained model.
Used CalibratedClassifierCV with cv='prefit' to calibrate without retraining the original model.
Evaluated calibration improvement using Brier score loss and calibration curves.
Results Interpretation

Before calibration, the Brier score loss was 0.18 with 85% accuracy. After applying sigmoid calibration, the Brier score loss improved to 0.11 while maintaining 85% accuracy. Isotonic calibration further improved Brier score loss to 0.10 with 84% accuracy.

The calibration curves show that before calibration, predicted probabilities were overconfident. After calibration, the curves align closer to the diagonal line, indicating better probability estimates.

Probability calibration adjusts the predicted probabilities to better reflect true likelihoods without changing the model's classification decisions. This improves trust in the model's probability outputs, which is important for risk-sensitive decisions.
Bonus Experiment
Try calibrating a different model type, such as a Random Forest classifier, and compare calibration results with logistic regression.
💡 Hint
Random Forests often produce poorly calibrated probabilities. Use the same calibration methods and evaluate Brier score loss and calibration curves to see improvements.

Practice

(1/5)
1. What is the main goal of probability calibration in machine learning?
easy
A. To adjust predicted probabilities to better reflect true likelihoods
B. To increase the accuracy of class labels
C. To reduce the size of the training dataset
D. To speed up the training process

Solution

  1. Step 1: Understand the purpose of probability calibration

    Probability calibration aims to make predicted probabilities match the actual chance of an event happening.
  2. Step 2: Differentiate from accuracy and training speed

    Accuracy relates to correct labels, not probability quality. Calibration focuses on probability quality, not dataset size or speed.
  3. Final Answer:

    To adjust predicted probabilities to better reflect true likelihoods -> Option A
  4. Quick Check:

    Calibration = Adjust probabilities [OK]
Hint: Calibration fixes probability quality, not accuracy or speed [OK]
Common Mistakes:
  • Confusing calibration with accuracy improvement
  • Thinking calibration changes dataset size
  • Assuming calibration speeds training
2. Which of the following is a common method used for probability calibration?
easy
A. K-means clustering
B. Gradient boosting
C. Platt scaling
D. Principal component analysis

Solution

  1. Step 1: Identify calibration methods

    Platt scaling is a sigmoid-based method commonly used to calibrate probabilities.
  2. Step 2: Exclude unrelated methods

    Gradient boosting is a model training technique, K-means is clustering, and PCA is dimensionality reduction, none are calibration methods.
  3. Final Answer:

    Platt scaling -> Option C
  4. Quick Check:

    Calibration method = Platt scaling [OK]
Hint: Remember Platt scaling for calibration, others are different tasks [OK]
Common Mistakes:
  • Confusing boosting with calibration
  • Mixing clustering or PCA with calibration
  • Choosing any popular ML method as calibration
3. Given the following Python code using scikit-learn, what will be the output of calibrated_clf.predict_proba([[0.5, 1.5]])?
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV

X, y = make_classification(n_samples=100, n_features=2, random_state=42)
clf = LogisticRegression().fit(X, y)
calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv='prefit')
calibrated_clf.fit(X, y)

probs = calibrated_clf.predict_proba([[0.5, 1.5]])
print(probs)
medium
A. A 2D array with calibrated probabilities for each class, e.g. [[0.3, 0.7]]
B. A single float value representing probability
C. An error because cv='prefit' requires a different fit method
D. A list of predicted class labels, e.g. [1]

Solution

  1. Step 1: Understand CalibratedClassifierCV output

    Using method='sigmoid' with cv='prefit' fits calibration on the existing model and outputs probabilities as a 2D array for each class.
  2. Step 2: Check predict_proba output format

    predict_proba returns probabilities for each class in a 2D array, not a single float or labels.
  3. Final Answer:

    A 2D array with calibrated probabilities for each class, e.g. [[0.3, 0.7]] -> Option A
  4. Quick Check:

    predict_proba output = 2D array [OK]
Hint: predict_proba always returns 2D array of class probabilities [OK]
Common Mistakes:
  • Expecting a single float instead of array
  • Confusing predict_proba with predict
  • Misunderstanding cv='prefit' usage
4. You tried to calibrate a classifier using CalibratedClassifierCV with cv=5, but got an error: "ValueError: Expected cv split to be a cross-validation generator or an iterable, got int instead." What is the likely cause?
medium
A. You passed cv=5 but the dataset has fewer than 5 samples
B. You passed an integer instead of a cross-validation splitter object
C. You used an unsupported calibration method
D. You forgot to fit the base classifier before calibration

Solution

  1. Step 1: Analyze the error message

    The error "Expected cv split to be a cross-validation generator or an iterable, got int instead." directly points to the cv parameter receiving an integer (5) where a splitter was expected.
  2. Step 2: Check CalibratedClassifierCV cv usage

    This occurs when cv is passed as int but the context requires an explicit cross-validation object like StratifiedKFold(5).
  3. Step 3: Rule out unrelated causes

    Base fitting (D) is for cv='prefit'; dataset size (B) or method (C) don't trigger this error.
  4. Final Answer:

    You passed an integer instead of a cross-validation splitter object -> Option B
  5. Quick Check:

    Error 'got int instead' = cv type mismatch [OK]
Hint: cv requires cross-validation generator or iterable, not plain int [OK]
Common Mistakes:
  • Passing an integer to cv instead of a splitter object
  • Confusing cv parameter usage
  • Assuming calibration method causes error
5. You have a binary classifier that outputs probabilities but they are poorly calibrated. You want to improve calibration on a small dataset without losing model accuracy. Which approach is best?
hard
A. Discard probabilities and use only predicted labels
B. Retrain the model with more epochs to improve accuracy
C. Use isotonic regression calibration on a separate validation set
D. Apply Platt scaling calibration using cross-validation

Solution

  1. Step 1: Consider calibration methods for small datasets

    Platt scaling (sigmoid) is preferred for small datasets because it is less prone to overfitting than isotonic regression.
  2. Step 2: Use cross-validation to avoid losing accuracy

    Applying Platt scaling with cross-validation calibrates probabilities without retraining the base model or losing accuracy.
  3. Step 3: Evaluate other options

    Isotonic regression may overfit small data, retraining may not fix calibration, discarding probabilities loses useful info.
  4. Final Answer:

    Apply Platt scaling calibration using cross-validation -> Option D
  5. Quick Check:

    Small data calibration = Platt scaling + CV [OK]
Hint: For small data, prefer Platt scaling with CV for calibration [OK]
Common Mistakes:
  • Using isotonic regression on small data causing overfit
  • Retraining model instead of calibrating
  • Ignoring probability calibration importance