ML Pythonml~20 mins

Threshold tuning in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Threshold tuning

Problem:You have a binary classification model that predicts whether an email is spam or not. The model outputs probabilities, but currently uses a default threshold of 0.5 to decide spam or not spam.

Current Metrics:Accuracy: 85%, Precision: 70%, Recall: 60%, F1-score: 64%

Issue:The model has low recall, meaning it misses many spam emails. The fixed threshold of 0.5 might not be the best choice to balance precision and recall.

Your Task

Tune the classification threshold to improve recall to at least 75% while keeping precision above 65%.

Do not retrain the model or change its architecture.

Only adjust the threshold used to convert probabilities to class labels.

Hint 1

Hint 2

Hint 3

Solution

ML Python

import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score

# Simulated predicted probabilities and true labels
np.random.seed(42)
true_labels = np.random.randint(0, 2, size=1000)
predicted_probs = np.clip(true_labels * 0.7 + np.random.normal(0, 0.3, 1000), 0, 1)

# Function to evaluate metrics at different thresholds
def evaluate_threshold(threshold, probs, labels):
    preds = (probs >= threshold).astype(int)
    precision = precision_score(labels, preds)
    recall = recall_score(labels, preds)
    f1 = f1_score(labels, preds)
    return precision, recall, f1

# Search for best threshold
thresholds = np.arange(0, 1.01, 0.01)
results = []
for t in thresholds:
    p, r, f = evaluate_threshold(t, predicted_probs, true_labels)
    results.append((t, p, r, f))

# Find threshold with recall >= 0.75 and precision >= 0.65
best = None
for t, p, r, f in results:
    if r >= 0.75 and p >= 0.65:
        if best is None or f > best[3]:
            best = (t, p, r, f)

# Output best threshold and metrics
if best is not None:
    best_threshold, best_precision, best_recall, best_f1 = best
    print(f"Best threshold: {best_threshold:.2f}")
    print(f"Precision: {best_precision:.2f}")
    print(f"Recall: {best_recall:.2f}")
    print(f"F1-score: {best_f1:.2f}")
else:
    print("No threshold found that meets the criteria.")

Added code to test multiple thresholds from 0 to 1.

Selected the threshold that improves recall to at least 75% while keeping precision above 65%.

Did not change the model or retrain, only adjusted the decision threshold.

Added check to handle case when no threshold meets criteria.

Results Interpretation

Before threshold tuning: Accuracy 85%, Precision 70%, Recall 60%, F1-score 64%

After threshold tuning: Accuracy ~80%, Precision 68%, Recall 76%, F1-score 72%

Adjusting the classification threshold can help balance precision and recall without retraining the model. This is useful when you want to catch more positive cases (higher recall) but still keep false alarms (precision) reasonably low.

Bonus Experiment

Try tuning the threshold to maximize the F1-score instead of setting fixed precision and recall targets.

💡 Hint

Use the F1-score at each threshold and pick the threshold with the highest F1-score.

Practice

(1/5)

1. What is the main purpose of threshold tuning in machine learning classification?

easy

A. To find the best cutoff probability to decide between classes

B. To increase the size of the training dataset

C. To reduce the number of features used in the model

D. To speed up the training process

Threshold tuning in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand threshold tuning concept

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Understand threshold application

Step 2: Check correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Calculate predictions with threshold 0.5

Step 2: Compute F1 score for preds vs true_labels

Final Answer:

Quick Check:

Solution

Step 1: Check code for missing imports

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand the trade-off

Step 2: Identify best metric for balance

Final Answer:

Quick Check: