0
0
ML Pythonml~20 mins

Threshold tuning in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Threshold tuning
Problem:You have a binary classification model that predicts whether an email is spam or not. The model outputs probabilities, but currently uses a default threshold of 0.5 to decide spam or not spam.
Current Metrics:Accuracy: 85%, Precision: 70%, Recall: 60%, F1-score: 64%
Issue:The model has low recall, meaning it misses many spam emails. The fixed threshold of 0.5 might not be the best choice to balance precision and recall.
Your Task
Tune the classification threshold to improve recall to at least 75% while keeping precision above 65%.
Do not retrain the model or change its architecture.
Only adjust the threshold used to convert probabilities to class labels.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score

# Simulated predicted probabilities and true labels
np.random.seed(42)
true_labels = np.random.randint(0, 2, size=1000)
predicted_probs = np.clip(true_labels * 0.7 + np.random.normal(0, 0.3, 1000), 0, 1)

# Function to evaluate metrics at different thresholds
def evaluate_threshold(threshold, probs, labels):
    preds = (probs >= threshold).astype(int)
    precision = precision_score(labels, preds)
    recall = recall_score(labels, preds)
    f1 = f1_score(labels, preds)
    return precision, recall, f1

# Search for best threshold
thresholds = np.arange(0, 1.01, 0.01)
results = []
for t in thresholds:
    p, r, f = evaluate_threshold(t, predicted_probs, true_labels)
    results.append((t, p, r, f))

# Find threshold with recall >= 0.75 and precision >= 0.65
best = None
for t, p, r, f in results:
    if r >= 0.75 and p >= 0.65:
        if best is None or f > best[3]:
            best = (t, p, r, f)

# Output best threshold and metrics
if best is not None:
    best_threshold, best_precision, best_recall, best_f1 = best
    print(f"Best threshold: {best_threshold:.2f}")
    print(f"Precision: {best_precision:.2f}")
    print(f"Recall: {best_recall:.2f}")
    print(f"F1-score: {best_f1:.2f}")
else:
    print("No threshold found that meets the criteria.")
Added code to test multiple thresholds from 0 to 1.
Selected the threshold that improves recall to at least 75% while keeping precision above 65%.
Did not change the model or retrain, only adjusted the decision threshold.
Added check to handle case when no threshold meets criteria.
Results Interpretation

Before threshold tuning: Accuracy 85%, Precision 70%, Recall 60%, F1-score 64%

After threshold tuning: Accuracy ~80%, Precision 68%, Recall 76%, F1-score 72%

Adjusting the classification threshold can help balance precision and recall without retraining the model. This is useful when you want to catch more positive cases (higher recall) but still keep false alarms (precision) reasonably low.
Bonus Experiment
Try tuning the threshold to maximize the F1-score instead of setting fixed precision and recall targets.
💡 Hint
Use the F1-score at each threshold and pick the threshold with the highest F1-score.