In natural language processing (NLP), we often want to know how well our model predicts the right answers. Accuracy tells us the overall percentage of correct predictions. But accuracy alone can be misleading if the data is unbalanced.
F1 score balances two important ideas: precision (how many predicted positives are actually correct) and recall (how many actual positives the model found). This is very useful when we care about both missing important cases and avoiding false alarms.
The confusion matrix shows the counts of true positives, false positives, true negatives, and false negatives. It helps us understand exactly where the model makes mistakes.