Feature union combines different sets of features into one big set. The goal is to improve model performance by using more information. So, the main metrics to watch are those that measure how well the model predicts using these combined features. For classification, accuracy, precision, recall, and F1 score matter. For regression, mean squared error or R-squared are important. These metrics tell us if adding features helps the model learn better.
Feature union in ML Python - Model Metrics & Evaluation
Imagine a binary classification model using features combined by feature union. Here is a confusion matrix from test data:
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP): 50 | False Positive (FP): 5 |
| False Negative (FN): 10 | True Negative (TN): 35 |
Total samples = 50 + 10 + 5 + 35 = 100
From this matrix, we calculate:
- Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
- Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
- Accuracy = (TP + TN) / Total = (50 + 35) / 100 = 0.85
- F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
When combining features, sometimes the model becomes better at finding positives (higher recall) but may also make more mistakes (lower precision). For example:
- High precision means most predicted positives are correct. Useful when false alarms are costly, like spam filters.
- High recall means most real positives are found. Important when missing positives is bad, like disease detection.
Feature union can help balance this by adding features that improve recall without hurting precision too much. But adding too many features can also confuse the model, lowering both.
Good values:
- Accuracy above baseline (better than simple model)
- Precision and recall both above 0.8, showing balanced performance
- F1 score close to or above 0.85, indicating good overall prediction
Bad values:
- Accuracy close to random guess (e.g., 50% for balanced classes)
- Precision very low (e.g., below 0.5), meaning many false positives
- Recall very low (e.g., below 0.5), meaning many missed positives
- F1 score low, showing poor balance between precision and recall
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. Feature union might add features that help majority class only.
- Data leakage: Combining features from future or test data can inflate metrics falsely.
- Overfitting: Adding too many features can make the model memorize training data, causing poor test performance.
- Ignoring metric tradeoffs: Focusing only on accuracy without checking precision and recall can hide problems.
Your model using feature union has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most positive cases (fraud). Even though accuracy is high, it likely predicts most samples as negative. For fraud detection, missing fraud is very bad, so recall is more important than accuracy here.