Bird
Raised Fist0
NLPml~20 mins

Visualizing embeddings (t-SNE) in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Embedding Visualization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
1:30remaining
Output of t-SNE embedding shape
Given the following code snippet that applies t-SNE on a set of 100 word embeddings each of dimension 50, what is the shape of the output embedding array?
NLP
from sklearn.manifold import TSNE
import numpy as np

embeddings = np.random.rand(100, 50)
tsne = TSNE(n_components=2, random_state=42)
reduced_embeddings = tsne.fit_transform(embeddings)
print(reduced_embeddings.shape)
A(50, 100)
B(2, 50)
C(100, 2)
D(100, 50)
Attempts:
2 left
💡 Hint
t-SNE reduces the dimensionality to the number of components specified.
🧠 Conceptual
intermediate
1:00remaining
Purpose of perplexity in t-SNE
What does the 'perplexity' parameter control in the t-SNE algorithm when visualizing embeddings?
AThe number of nearest neighbors considered during the embedding
BThe learning rate for gradient descent optimization
CThe final number of dimensions after reduction
DThe random seed for reproducibility
Attempts:
2 left
💡 Hint
Think about how t-SNE balances local and global aspects of data.
Metrics
advanced
1:30remaining
Evaluating t-SNE visualization quality
Which metric is commonly used to evaluate how well a t-SNE visualization preserves the local structure of high-dimensional data?
AKullback-Leibler divergence
BMean Squared Error
CAccuracy
DSilhouette Score
Attempts:
2 left
💡 Hint
t-SNE minimizes a specific divergence during training.
🔧 Debug
advanced
2:00remaining
Identifying error in t-SNE usage
What error will the following code raise when trying to visualize embeddings with t-SNE?
NLP
from sklearn.manifold import TSNE
import numpy as np

embeddings = np.random.rand(50, 100)
tsne = TSNE(n_components=3)
reduced = tsne.fit_transform(embeddings)
print(reduced.shape)
ATypeError: fit_transform() missing 1 required positional argument
BAttributeError: 'TSNE' object has no attribute 'fit_transform'
CValueError: n_components must be less than original dimension
DNo error, output shape is (50, 3)
Attempts:
2 left
💡 Hint
Check the relationship between n_components and input dimension.
Model Choice
expert
2:30remaining
Choosing dimensionality reduction for large NLP embeddings
You have 1 million word embeddings of dimension 300 and want to visualize them in 2D. Which dimensionality reduction technique is most suitable considering both speed and quality?
At-SNE with default parameters
BUMAP on the full dataset
CPCA followed by t-SNE on a sample
DRandom projection to 2D
Attempts:
2 left
💡 Hint
Consider scalability and preservation of local/global structure.

Practice

(1/5)
1. What is the main purpose of using t-SNE in visualizing word embeddings?
easy
A. To train word embeddings from raw text data
B. To increase the size of word embeddings for better accuracy
C. To reduce high-dimensional word vectors into 2D or 3D for easy visualization
D. To cluster words based on their frequency in the text

Solution

  1. Step 1: Understand t-SNE's role in dimensionality reduction

    t-SNE reduces complex, high-dimensional data like word embeddings into 2D or 3D space for visualization.
  2. Step 2: Differentiate from other tasks

    It does not train embeddings or cluster by frequency but helps visualize similarity by reducing dimensions.
  3. Final Answer:

    To reduce high-dimensional word vectors into 2D or 3D for easy visualization -> Option C
  4. Quick Check:

    t-SNE = dimensionality reduction for visualization [OK]
Hint: t-SNE = reduce dimensions to visualize complex data [OK]
Common Mistakes:
  • Confusing t-SNE with training embeddings
  • Thinking t-SNE increases data size
  • Assuming t-SNE clusters by word frequency
2. Which of the following is the correct way to import t-SNE from scikit-learn in Python?
easy
A. from sklearn.manifold import TSNE
B. import sklearn.tsne as TSNE
C. from sklearn.embedding import tSNE
D. import tsne from sklearn

Solution

  1. Step 1: Recall correct module for t-SNE in scikit-learn

    t-SNE is in the sklearn.manifold module and is imported as TSNE.
  2. Step 2: Check syntax correctness

    from sklearn.manifold import TSNE uses correct syntax: from sklearn.manifold import TSNE. Others are invalid imports.
  3. Final Answer:

    from sklearn.manifold import TSNE -> Option A
  4. Quick Check:

    Correct import = from sklearn.manifold import TSNE [OK]
Hint: t-SNE is in sklearn.manifold, import as TSNE [OK]
Common Mistakes:
  • Using wrong module like sklearn.embedding
  • Incorrect import syntax
  • Confusing lowercase and uppercase in import
3. Given this Python code snippet using t-SNE, what will be the shape of embeddings_2d?
from sklearn.manifold import TSNE
import numpy as np

embeddings = np.random.rand(100, 50)  # 100 words, 50 dimensions
model = TSNE(n_components=2, random_state=42)
embeddings_2d = model.fit_transform(embeddings)
medium
A. (100, 2)
B. (2, 100)
C. (50, 2)
D. (100, 50)

Solution

  1. Step 1: Understand input shape and t-SNE output

    Input embeddings have shape (100, 50) meaning 100 samples with 50 features each.
  2. Step 2: Check t-SNE output shape with n_components=2

    t-SNE reduces features to 2 dimensions, so output shape is (100, 2) -- 100 samples, 2 features.
  3. Final Answer:

    (100, 2) -> Option A
  4. Quick Check:

    Output shape = (samples, n_components) = (100, 2) [OK]
Hint: Output shape = (samples, n_components) in t-SNE [OK]
Common Mistakes:
  • Confusing rows and columns in output shape
  • Assuming output shape equals input shape
  • Mixing up n_components with sample count
4. You run t-SNE on word embeddings but get a ValueError: "perplexity must be less than n_samples". What is the likely cause and fix?
medium
A. Input embeddings have wrong shape; reshape to (features, samples)
B. Perplexity is set too high; reduce it below the number of samples
C. Random state is missing; add random_state parameter
D. t-SNE requires normalized data; normalize embeddings first

Solution

  1. Step 1: Understand perplexity parameter in t-SNE

    Perplexity controls neighborhood size and must be less than the number of samples.
  2. Step 2: Identify cause of ValueError

    Error means perplexity is set equal or larger than sample count, which is invalid.
  3. Step 3: Fix by lowering perplexity

    Reduce perplexity to a value smaller than the number of samples to fix the error.
  4. Final Answer:

    Perplexity is set too high; reduce it below the number of samples -> Option B
  5. Quick Check:

    Perplexity < n_samples to avoid error [OK]
Hint: Keep perplexity less than sample count in t-SNE [OK]
Common Mistakes:
  • Changing input shape instead of perplexity
  • Ignoring the perplexity limit
  • Assuming normalization fixes this error
5. You want to visualize embeddings of 5000 words using t-SNE but notice the plot is very crowded and unclear. Which approach best improves visualization clarity?
hard
A. Apply t-SNE with n_components=50 to keep more dimensions
B. Increase perplexity to a very high value like 1000 to spread points out
C. Use raw high-dimensional embeddings without dimensionality reduction
D. Reduce the number of words by selecting a smaller subset before applying t-SNE

Solution

  1. Step 1: Understand t-SNE limitations with large datasets

    t-SNE works best with small to medium data; large sets cause crowded plots and slow computation.
  2. Step 2: Choose practical solution for clarity

    Reducing the dataset size by selecting fewer words improves plot clarity and speed.
  3. Step 3: Evaluate other options

    Increasing perplexity too high or keeping many dimensions defeats t-SNE's purpose; raw embeddings are hard to visualize.
  4. Final Answer:

    Reduce the number of words by selecting a smaller subset before applying t-SNE -> Option D
  5. Quick Check:

    Smaller data = clearer t-SNE plots [OK]
Hint: Use smaller data subsets for clearer t-SNE plots [OK]
Common Mistakes:
  • Setting perplexity too high
  • Using too many dimensions in t-SNE
  • Trying to visualize raw embeddings directly