0
0
NLPml~8 mins

Visualizing embeddings (t-SNE) in NLP - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Visualizing embeddings (t-SNE)
Which metric matters for Visualizing embeddings (t-SNE) and WHY

When we use t-SNE to visualize embeddings, we want to see if similar items group together clearly. The main "metric" is how well the visualization shows clusters or patterns that match what we expect. This is not a number like accuracy but a visual check of neighborhood preservation. We look for tight groups of similar points and clear separation between different groups.

Confusion matrix or equivalent visualization

t-SNE does not produce a confusion matrix because it is for visualization, not classification. Instead, we look at a 2D or 3D scatter plot of points representing embeddings. Points close together mean similar data. For example:

    Class A: ● ● ●       Class B: ○ ○ ○
             ●                 ○
    Class A points cluster tightly, separate from Class B points.
    

This visual grouping helps us understand if embeddings capture meaningful differences.

Precision vs Recall tradeoff with concrete examples

t-SNE visualization is not about precision or recall. But there is a tradeoff in how t-SNE balances preserving local vs global structure:

  • Local structure: t-SNE tries to keep similar points close. This helps see small clusters clearly.
  • Global structure: t-SNE may distort distances between big groups to keep local neighborhoods intact.

For example, if you want to see small groups of similar words, focus on local structure. If you want to see overall group relations, t-SNE might not show that well.

What "good" vs "bad" metric values look like for this use case

Since t-SNE is visual, "good" means:

  • Clear clusters of points that match known categories or labels.
  • Minimal overlap between different groups.
  • Consistent grouping when running t-SNE multiple times (with same parameters and random seed).

"Bad" means:

  • Points from different groups mixed randomly.
  • No visible clusters or patterns.
  • Very different results each time you run t-SNE.
Metrics pitfalls
  • Over-interpretation: t-SNE plots look nice but do not prove model quality. They are just a tool to explore data.
  • Randomness: t-SNE uses randomness. Different runs can look different unless you fix the random seed.
  • Parameter sensitivity: Perplexity and learning rate affect results a lot. Wrong settings can hide true structure.
  • Global structure distortion: t-SNE focuses on local neighborhoods, so distances between clusters may not be meaningful.
  • Data leakage: Visualizing embeddings from training data only can hide problems. Always check embeddings on new data too.
Self-check question

Your t-SNE plot shows three clear clusters matching your known categories, but when you run it again with a different random seed, the clusters look different. Is your visualization reliable? What should you do?

Answer: The visualization is not fully reliable because t-SNE randomness changes results. You should fix the random seed to get consistent plots. Also, try different parameters and check if clusters remain stable. This helps confirm the embeddings truly capture meaningful groups.

Key Result
t-SNE visualization quality is judged by clear, stable clusters that reflect true data similarity, not numeric metrics.