Recall in Machine Learning with Python: Definition and Example
recall measures how well a model finds all the positive cases. It is the ratio of correctly predicted positive observations to all actual positives, showing the model's ability to catch positives. In Python, you can calculate recall using sklearn.metrics.recall_score.How It Works
Recall tells us how many of the actual positive cases our model correctly identified. Imagine you are a doctor testing for a disease. Recall answers the question: out of all the sick patients, how many did the test catch?
It is calculated as the number of true positives divided by the sum of true positives and false negatives. True positives are cases correctly found positive, and false negatives are positive cases the model missed.
High recall means the model misses very few positive cases, which is important when missing a positive is costly, like in medical diagnosis or fraud detection.
Example
This example shows how to calculate recall in Python using sklearn. We create true labels and predicted labels, then compute recall score.
from sklearn.metrics import recall_score # True labels (actual values) y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0] # Predicted labels from model y_pred = [1, 0, 0, 1, 0, 1, 0, 1, 1, 0] # Calculate recall recall = recall_score(y_true, y_pred) print(f"Recall: {recall:.2f}")
When to Use
Use recall when it is very important to catch all positive cases, even if it means some false alarms. For example:
- Medical tests where missing a disease can be dangerous
- Fraud detection to catch as many fraud cases as possible
- Spam filters where you want to catch all spam emails
Recall helps balance the cost of missing positives versus false positives depending on the problem.
Key Points
- Recall measures how many actual positives the model correctly finds.
- It is important when missing positives is costly.
- Recall = True Positives / (True Positives + False Negatives).
- Use
sklearn.metrics.recall_scorein Python to calculate it.