0
0
NLPml~8 mins

FastText embeddings in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - FastText embeddings
Which metric matters for FastText embeddings and WHY

FastText embeddings create word vectors that capture meaning. To check how good these vectors are, we use cosine similarity. It measures how close two word vectors are in meaning. A higher cosine similarity means words are more related. For tasks using FastText, like text classification, we also check accuracy or F1 score to see how well the model understands text using these embeddings.

Confusion matrix example for a text classification using FastText embeddings
      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP) = 80  | False Negative (FN) = 20 |
      | False Positive (FP) = 10 | True Negative (TN) = 90  |

      Total samples = 80 + 20 + 10 + 90 = 200
    

From this matrix, we calculate:

  • Precision = TP / (TP + FP) = 80 / (80 + 10) = 0.89
  • Recall = TP / (TP + FN) = 80 / (80 + 20) = 0.80
  • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.84
Precision vs Recall tradeoff with FastText embeddings

Imagine a spam detector using FastText embeddings:

  • High Precision: Few good emails are wrongly marked as spam. Users don't miss important emails.
  • High Recall: Most spam emails are caught. Less spam reaches the inbox.

Depending on what matters more, we adjust the model. For spam, high precision is often preferred to avoid losing good emails. For medical text classification, high recall is critical to catch all important cases.

What "good" vs "bad" metric values look like for FastText embeddings

Good metrics mean the embeddings help the model understand text well:

  • Good: Accuracy > 85%, F1 score > 0.8, cosine similarity between related words > 0.7
  • Bad: Accuracy < 60%, F1 score < 0.5, cosine similarity between related words < 0.3

Bad values suggest embeddings do not capture meaning well or the model is not learning properly.

Common pitfalls in evaluating FastText embeddings
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced.
  • Data leakage: Using test data during training inflates metrics falsely.
  • Overfitting: Very high training accuracy but low test accuracy means the model memorizes instead of generalizing.
  • Ignoring semantic similarity: Only checking classification metrics misses how well embeddings capture word meaning.
Self-check question

Your text classification model using FastText embeddings has 98% accuracy but only 12% recall on the positive class. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses most positive cases, which can be critical depending on the task. High accuracy alone is misleading if the positive class is rare.

Key Result
Cosine similarity measures embedding quality; classification metrics like precision, recall, and F1 score evaluate model performance using FastText embeddings.