0
0
ML Pythonml~8 mins

Recursive feature elimination in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Recursive feature elimination
Which metric matters for Recursive Feature Elimination and WHY

Recursive Feature Elimination (RFE) helps pick the best features for a model. The key metric to watch is the model's performance metric like accuracy, F1 score, or mean squared error after each feature removal step. This shows if removing features helps or hurts the model. We want to keep features that improve or keep performance stable.

Confusion Matrix Example

Imagine a classification model using RFE. After selecting features, we check the confusion matrix:

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 50 | False Negative (FN): 10 |
      | False Positive (FP): 5 | True Negative (TN): 35 |
    

Total samples = 50 + 10 + 5 + 35 = 100

Precision = 50 / (50 + 5) = 0.91

Recall = 50 / (50 + 10) = 0.83

F1 Score = 2 * (0.91 * 0.83) / (0.91 + 0.83) ≈ 0.87

These metrics tell us how well the model performs with the chosen features.

Precision vs Recall Tradeoff in Feature Selection

When RFE removes features, it can affect precision and recall differently.

  • High Precision Needed: For spam detection, we want to avoid marking good emails as spam. So, RFE should keep features that help precision.
  • High Recall Needed: For disease detection, missing a sick patient is bad. RFE should keep features that help recall.

RFE helps find the smallest feature set that balances these metrics well.

Good vs Bad Metric Values for RFE

Good: After RFE, model accuracy or F1 score stays high or improves. For example, accuracy above 90% with fewer features means success.

Bad: Metrics drop a lot after removing features, like accuracy falling from 90% to 70%. This means important features were removed.

Common Pitfalls in Metrics with RFE
  • Overfitting: If RFE is done on the whole dataset before splitting, it leaks information and inflates metrics.
  • Ignoring Validation: Only checking training accuracy can mislead. Always check metrics on unseen data.
  • Accuracy Paradox: High accuracy can hide poor recall or precision if classes are imbalanced.
Self Check

Your model after RFE has 98% accuracy but only 12% recall on fraud cases. Is it good?

Answer: No. Even with high accuracy, the model misses most fraud cases (low recall). For fraud detection, recall is critical to catch fraud. So, this model is not good for production.

Key Result
RFE success is measured by stable or improved model performance metrics (accuracy, F1) after feature removal, ensuring important features remain.