Bird
Raised Fist0
NLPml~8 mins

Visualizing embeddings (t-SNE) in NLP - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Visualizing embeddings (t-SNE)
Which metric matters for Visualizing embeddings (t-SNE) and WHY

When we use t-SNE to visualize embeddings, we want to see if similar items group together clearly. The main "metric" is how well the visualization shows clusters or patterns that match what we expect. This is not a number like accuracy but a visual check of neighborhood preservation. We look for tight groups of similar points and clear separation between different groups.

Confusion matrix or equivalent visualization

t-SNE does not produce a confusion matrix because it is for visualization, not classification. Instead, we look at a 2D or 3D scatter plot of points representing embeddings. Points close together mean similar data. For example:

    Class A: ● ● ●       Class B: ○ ○ ○
             ●                 ○
    Class A points cluster tightly, separate from Class B points.
    

This visual grouping helps us understand if embeddings capture meaningful differences.

Precision vs Recall tradeoff with concrete examples

t-SNE visualization is not about precision or recall. But there is a tradeoff in how t-SNE balances preserving local vs global structure:

  • Local structure: t-SNE tries to keep similar points close. This helps see small clusters clearly.
  • Global structure: t-SNE may distort distances between big groups to keep local neighborhoods intact.

For example, if you want to see small groups of similar words, focus on local structure. If you want to see overall group relations, t-SNE might not show that well.

What "good" vs "bad" metric values look like for this use case

Since t-SNE is visual, "good" means:

  • Clear clusters of points that match known categories or labels.
  • Minimal overlap between different groups.
  • Consistent grouping when running t-SNE multiple times (with same parameters and random seed).

"Bad" means:

  • Points from different groups mixed randomly.
  • No visible clusters or patterns.
  • Very different results each time you run t-SNE.
Metrics pitfalls
  • Over-interpretation: t-SNE plots look nice but do not prove model quality. They are just a tool to explore data.
  • Randomness: t-SNE uses randomness. Different runs can look different unless you fix the random seed.
  • Parameter sensitivity: Perplexity and learning rate affect results a lot. Wrong settings can hide true structure.
  • Global structure distortion: t-SNE focuses on local neighborhoods, so distances between clusters may not be meaningful.
  • Data leakage: Visualizing embeddings from training data only can hide problems. Always check embeddings on new data too.
Self-check question

Your t-SNE plot shows three clear clusters matching your known categories, but when you run it again with a different random seed, the clusters look different. Is your visualization reliable? What should you do?

Answer: The visualization is not fully reliable because t-SNE randomness changes results. You should fix the random seed to get consistent plots. Also, try different parameters and check if clusters remain stable. This helps confirm the embeddings truly capture meaningful groups.

Key Result
t-SNE visualization quality is judged by clear, stable clusters that reflect true data similarity, not numeric metrics.

Practice

(1/5)
1. What is the main purpose of using t-SNE in visualizing word embeddings?
easy
A. To train word embeddings from raw text data
B. To increase the size of word embeddings for better accuracy
C. To reduce high-dimensional word vectors into 2D or 3D for easy visualization
D. To cluster words based on their frequency in the text

Solution

  1. Step 1: Understand t-SNE's role in dimensionality reduction

    t-SNE reduces complex, high-dimensional data like word embeddings into 2D or 3D space for visualization.
  2. Step 2: Differentiate from other tasks

    It does not train embeddings or cluster by frequency but helps visualize similarity by reducing dimensions.
  3. Final Answer:

    To reduce high-dimensional word vectors into 2D or 3D for easy visualization -> Option C
  4. Quick Check:

    t-SNE = dimensionality reduction for visualization [OK]
Hint: t-SNE = reduce dimensions to visualize complex data [OK]
Common Mistakes:
  • Confusing t-SNE with training embeddings
  • Thinking t-SNE increases data size
  • Assuming t-SNE clusters by word frequency
2. Which of the following is the correct way to import t-SNE from scikit-learn in Python?
easy
A. from sklearn.manifold import TSNE
B. import sklearn.tsne as TSNE
C. from sklearn.embedding import tSNE
D. import tsne from sklearn

Solution

  1. Step 1: Recall correct module for t-SNE in scikit-learn

    t-SNE is in the sklearn.manifold module and is imported as TSNE.
  2. Step 2: Check syntax correctness

    from sklearn.manifold import TSNE uses correct syntax: from sklearn.manifold import TSNE. Others are invalid imports.
  3. Final Answer:

    from sklearn.manifold import TSNE -> Option A
  4. Quick Check:

    Correct import = from sklearn.manifold import TSNE [OK]
Hint: t-SNE is in sklearn.manifold, import as TSNE [OK]
Common Mistakes:
  • Using wrong module like sklearn.embedding
  • Incorrect import syntax
  • Confusing lowercase and uppercase in import
3. Given this Python code snippet using t-SNE, what will be the shape of embeddings_2d?
from sklearn.manifold import TSNE
import numpy as np

embeddings = np.random.rand(100, 50)  # 100 words, 50 dimensions
model = TSNE(n_components=2, random_state=42)
embeddings_2d = model.fit_transform(embeddings)
medium
A. (100, 2)
B. (2, 100)
C. (50, 2)
D. (100, 50)

Solution

  1. Step 1: Understand input shape and t-SNE output

    Input embeddings have shape (100, 50) meaning 100 samples with 50 features each.
  2. Step 2: Check t-SNE output shape with n_components=2

    t-SNE reduces features to 2 dimensions, so output shape is (100, 2) -- 100 samples, 2 features.
  3. Final Answer:

    (100, 2) -> Option A
  4. Quick Check:

    Output shape = (samples, n_components) = (100, 2) [OK]
Hint: Output shape = (samples, n_components) in t-SNE [OK]
Common Mistakes:
  • Confusing rows and columns in output shape
  • Assuming output shape equals input shape
  • Mixing up n_components with sample count
4. You run t-SNE on word embeddings but get a ValueError: "perplexity must be less than n_samples". What is the likely cause and fix?
medium
A. Input embeddings have wrong shape; reshape to (features, samples)
B. Perplexity is set too high; reduce it below the number of samples
C. Random state is missing; add random_state parameter
D. t-SNE requires normalized data; normalize embeddings first

Solution

  1. Step 1: Understand perplexity parameter in t-SNE

    Perplexity controls neighborhood size and must be less than the number of samples.
  2. Step 2: Identify cause of ValueError

    Error means perplexity is set equal or larger than sample count, which is invalid.
  3. Step 3: Fix by lowering perplexity

    Reduce perplexity to a value smaller than the number of samples to fix the error.
  4. Final Answer:

    Perplexity is set too high; reduce it below the number of samples -> Option B
  5. Quick Check:

    Perplexity < n_samples to avoid error [OK]
Hint: Keep perplexity less than sample count in t-SNE [OK]
Common Mistakes:
  • Changing input shape instead of perplexity
  • Ignoring the perplexity limit
  • Assuming normalization fixes this error
5. You want to visualize embeddings of 5000 words using t-SNE but notice the plot is very crowded and unclear. Which approach best improves visualization clarity?
hard
A. Apply t-SNE with n_components=50 to keep more dimensions
B. Increase perplexity to a very high value like 1000 to spread points out
C. Use raw high-dimensional embeddings without dimensionality reduction
D. Reduce the number of words by selecting a smaller subset before applying t-SNE

Solution

  1. Step 1: Understand t-SNE limitations with large datasets

    t-SNE works best with small to medium data; large sets cause crowded plots and slow computation.
  2. Step 2: Choose practical solution for clarity

    Reducing the dataset size by selecting fewer words improves plot clarity and speed.
  3. Step 3: Evaluate other options

    Increasing perplexity too high or keeping many dimensions defeats t-SNE's purpose; raw embeddings are hard to visualize.
  4. Final Answer:

    Reduce the number of words by selecting a smaller subset before applying t-SNE -> Option D
  5. Quick Check:

    Smaller data = clearer t-SNE plots [OK]
Hint: Use smaller data subsets for clearer t-SNE plots [OK]
Common Mistakes:
  • Setting perplexity too high
  • Using too many dimensions in t-SNE
  • Trying to visualize raw embeddings directly