Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to import the calibration module from scikit-learn.

ML Python

from sklearn.calibration import [1]

Drag options to blanks, or click blank then click option'

ACalibratedClassifierCV

Btrain_test_split

CStandardScaler

DLogisticRegression

Attempts:

3 left

💡 Hint

Common Mistakes

Importing unrelated classes like train_test_split or StandardScaler.

Confusing model classes with calibration classes.

✗ Incorrect

The CalibratedClassifierCV class is used for probability calibration in scikit-learn.

2fill in blank

medium

Complete the code to create a calibrated classifier using sigmoid method.

ML Python

calibrated_clf = CalibratedClassifierCV(base_estimator=clf, method=[1], cv='prefit')

Drag options to blanks, or click blank then click option'

Apolynomial

Bisotonic

Clinear

Dsigmoid

Attempts:

3 left

💡 Hint

Common Mistakes

Using 'isotonic' when sigmoid is required.

Choosing methods not supported by CalibratedClassifierCV.

✗ Incorrect

The sigmoid method applies Platt scaling for probability calibration.

3fill in blank

hard

Fix the error in the code to correctly fit the calibrated classifier.

ML Python

calibrated_clf.[1](X_calib, y_calib)

Drag options to blanks, or click blank then click option'

Apredict

Bfit

Cpredict_proba

Dscore

Attempts:

3 left

💡 Hint

Common Mistakes

Calling predict or predict_proba before fitting the model.

Using score method which only evaluates performance.

✗ Incorrect

The fit method trains the calibrated classifier on calibration data.

4fill in blank

hard

Fill both blanks to create a dictionary of calibrated probabilities for each class.

ML Python

calibrated_probs = {cls: calibrated_clf.predict_proba(X)[:, [1]] for [2] in calibrated_clf.classes_}

Drag options to blanks, or click blank then click option'

Ccls

Dclass_label

Attempts:

3 left

💡 Hint

Common Mistakes

Using a fixed index like 0 or 1 instead of the loop variable.

Using a variable name not defined in the comprehension.

✗ Incorrect

cls selects the probability for the corresponding class, and class_label is the variable iterating over classes.

5fill in blank

hard

Fill all three blanks to compute Brier score loss for calibrated predictions.

ML Python

from sklearn.metrics import [1]
brier_score = [2](y_true, calibrated_clf.[3](X_test)[:, 1])

Drag options to blanks, or click blank then click option'

Aaccuracy_score

Bbrier_score_loss

Cpredict_proba

Dpredict

Attempts:

3 left

💡 Hint

Common Mistakes

Using accuracy_score which measures classification accuracy, not probability calibration.

Using predict instead of predict_proba which returns class labels.

✗ Incorrect

The brier_score_loss function measures the accuracy of probabilistic predictions, which are obtained using predict_proba.

Practice

(1/5)

1. What is the main goal of probability calibration in machine learning?

easy

A. To adjust predicted probabilities to better reflect true likelihoods

B. To increase the accuracy of class labels

C. To reduce the size of the training dataset

D. To speed up the training process

Solution

Step 1: Understand the purpose of probability calibration
Probability calibration aims to make predicted probabilities match the actual chance of an event happening.
Step 2: Differentiate from accuracy and training speed
Accuracy relates to correct labels, not probability quality. Calibration focuses on probability quality, not dataset size or speed.
Final Answer:
To adjust predicted probabilities to better reflect true likelihoods -> Option A
Quick Check:
Calibration = Adjust probabilities [OK]

Hint: Calibration fixes probability quality, not accuracy or speed [OK]

Common Mistakes:

Confusing calibration with accuracy improvement
Thinking calibration changes dataset size
Assuming calibration speeds training

2. Which of the following is a common method used for probability calibration?

easy

A. K-means clustering

B. Gradient boosting

C. Platt scaling

D. Principal component analysis

Solution

Step 1: Identify calibration methods
Platt scaling is a sigmoid-based method commonly used to calibrate probabilities.
Step 2: Exclude unrelated methods
Gradient boosting is a model training technique, K-means is clustering, and PCA is dimensionality reduction, none are calibration methods.
Final Answer:
Platt scaling -> Option C
Quick Check:
Calibration method = Platt scaling [OK]

Hint: Remember Platt scaling for calibration, others are different tasks [OK]

Common Mistakes:

Confusing boosting with calibration
Mixing clustering or PCA with calibration
Choosing any popular ML method as calibration

3. Given the following Python code using scikit-learn, what will be the output of calibrated_clf.predict_proba([[0.5, 1.5]])?

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV

X, y = make_classification(n_samples=100, n_features=2, random_state=42)
clf = LogisticRegression().fit(X, y)
calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv='prefit')
calibrated_clf.fit(X, y)

probs = calibrated_clf.predict_proba([[0.5, 1.5]])
print(probs)

medium

A. A 2D array with calibrated probabilities for each class, e.g. [[0.3, 0.7]]

B. A single float value representing probability

C. An error because cv='prefit' requires a different fit method

D. A list of predicted class labels, e.g. [1]

Solution

Step 1: Understand CalibratedClassifierCV output
Using method='sigmoid' with cv='prefit' fits calibration on the existing model and outputs probabilities as a 2D array for each class.
Step 2: Check predict_proba output format
predict_proba returns probabilities for each class in a 2D array, not a single float or labels.
Final Answer:
A 2D array with calibrated probabilities for each class, e.g. [[0.3, 0.7]] -> Option A
Quick Check:
predict_proba output = 2D array [OK]

Hint: predict_proba always returns 2D array of class probabilities [OK]

Common Mistakes:

Expecting a single float instead of array
Confusing predict_proba with predict
Misunderstanding cv='prefit' usage

4. You tried to calibrate a classifier using CalibratedClassifierCV with cv=5, but got an error: "ValueError: Expected cv split to be a cross-validation generator or an iterable, got int instead." What is the likely cause?

medium

A. You passed cv=5 but the dataset has fewer than 5 samples

B. You passed an integer instead of a cross-validation splitter object

C. You used an unsupported calibration method

D. You forgot to fit the base classifier before calibration

Solution

Step 1: Analyze the error message
The error "Expected cv split to be a cross-validation generator or an iterable, got int instead." directly points to the cv parameter receiving an integer (5) where a splitter was expected.
Step 2: Check CalibratedClassifierCV cv usage
This occurs when cv is passed as int but the context requires an explicit cross-validation object like StratifiedKFold(5).
Step 3: Rule out unrelated causes
Base fitting (D) is for cv='prefit'; dataset size (B) or method (C) don't trigger this error.
Final Answer:
You passed an integer instead of a cross-validation splitter object -> Option B
Quick Check:
Error 'got int instead' = cv type mismatch [OK]

Hint: cv requires cross-validation generator or iterable, not plain int [OK]

Common Mistakes:

Passing an integer to cv instead of a splitter object
Confusing cv parameter usage
Assuming calibration method causes error

5. You have a binary classifier that outputs probabilities but they are poorly calibrated. You want to improve calibration on a small dataset without losing model accuracy. Which approach is best?

hard

A. Discard probabilities and use only predicted labels

B. Retrain the model with more epochs to improve accuracy

C. Use isotonic regression calibration on a separate validation set

D. Apply Platt scaling calibration using cross-validation

Solution

Step 1: Consider calibration methods for small datasets
Platt scaling (sigmoid) is preferred for small datasets because it is less prone to overfitting than isotonic regression.
Step 2: Use cross-validation to avoid losing accuracy
Applying Platt scaling with cross-validation calibrates probabilities without retraining the base model or losing accuracy.
Step 3: Evaluate other options
Isotonic regression may overfit small data, retraining may not fix calibration, discarding probabilities loses useful info.
Final Answer:
Apply Platt scaling calibration using cross-validation -> Option D
Quick Check:
Small data calibration = Platt scaling + CV [OK]

Hint: For small data, prefer Platt scaling with CV for calibration [OK]

Common Mistakes:

Using isotonic regression on small data causing overfit
Retraining model instead of calibrating
Ignoring probability calibration importance

Probability calibration in ML Python - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of probability calibration

Step 2: Differentiate from accuracy and training speed

Final Answer:

Quick Check:

Solution

Step 1: Identify calibration methods

Step 2: Exclude unrelated methods

Final Answer:

Quick Check:

Solution

Step 1: Understand CalibratedClassifierCV output

Step 2: Check predict_proba output format

Final Answer:

Quick Check:

Solution

Step 1: Analyze the error message

Step 2: Check CalibratedClassifierCV cv usage

Step 3: Rule out unrelated causes

Final Answer:

Quick Check:

Solution

Step 1: Consider calibration methods for small datasets

Step 2: Use cross-validation to avoid losing accuracy

Step 3: Evaluate other options

Final Answer:

Quick Check: