Label encoding changes words or categories into numbers so a model can understand them. The main metric to check after label encoding is accuracy or model performance on the task using encoded data. This is because label encoding itself does not create predictions but affects how well the model learns. If encoding is wrong, the model may learn poorly.
Label encoding in ML Python - Model Metrics & Evaluation
Imagine a model classifying fruits after label encoding:
Actual \ Predicted | Apple (0) | Banana (1) | Cherry (2)
---------------------------------------------------
Apple (0) | 50 | 2 | 3
Banana (1) | 1 | 45 | 4
Cherry (2) | 0 | 3 | 47
This matrix shows how well the model predicts each encoded label.
Label encoding itself does not directly affect precision or recall, but it impacts the model's ability to learn categories correctly.
For example, if label encoding assigns numbers arbitrarily, the model might think some categories are closer than others, causing confusion.
Choosing the right encoding method helps the model balance precision (correct positive predictions) and recall (finding all positives).
Good: High accuracy, precision, and recall on the model's task mean label encoding helped the model learn well.
Bad: Low accuracy or strange errors may mean label encoding caused confusion, like treating categories as numbers with order when they are not.
- Misleading order: Label encoding assigns numbers but does not mean order exists. Models may wrongly assume order.
- Data leakage: Encoding categories from test data before training can leak information.
- Overfitting: If encoding is inconsistent, model may memorize wrong patterns.
- Accuracy paradox: High accuracy can hide poor performance on rare categories.
Your model has 98% accuracy but only 12% recall on a rare category after label encoding. Is it good?
No. The model misses most cases of that category. Label encoding might have caused confusion or the model struggles to learn that category well. You should check encoding and consider other methods like one-hot encoding.