Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the main purpose of t-SNE in visualizing embeddings?
t-SNE helps to reduce high-dimensional data (like word embeddings) into 2 or 3 dimensions so we can see patterns and clusters easily on a simple plot.
Click to reveal answer
beginner
Why can't we just plot embeddings directly without t-SNE?
Embeddings usually have many dimensions (like 100 or 300), which we can't visualize directly. t-SNE reduces these dimensions while keeping similar points close together.
Click to reveal answer
beginner
What does it mean when points are close together in a t-SNE plot of embeddings?
Points close together mean their original embeddings are similar, so the words or items they represent are related or have similar meanings.
Click to reveal answer
intermediate
What is a common challenge when using t-SNE for embedding visualization?
t-SNE can be slow on large datasets and sometimes shows different results each time because it uses randomness in its calculations.
Click to reveal answer
intermediate
Name one alternative to t-SNE for visualizing embeddings.
UMAP is a popular alternative that is faster and often preserves more of the global structure in the data.
Click to reveal answer
What does t-SNE primarily do with high-dimensional embeddings?
AReduce dimensions to 2 or 3 for visualization
BIncrease dimensions for better accuracy
CConvert embeddings into text
DRemove noise from embeddings
✗ Incorrect
t-SNE reduces high-dimensional data to 2 or 3 dimensions so we can visualize it easily.
In a t-SNE plot, what does it mean if two points are far apart?
AThey represent the same word
BTheir embeddings are very different
CThey have identical meanings
DThey are errors in the data
✗ Incorrect
Points far apart in t-SNE space usually mean their original embeddings are quite different.
Which of these is a limitation of t-SNE?
AIt always produces the same output
BIt increases data dimensions
CIt can be slow on large datasets
DIt removes important data features
✗ Incorrect
t-SNE can be slow and sometimes produces different results due to randomness.
Which alternative method is known for faster embedding visualization than t-SNE?
ALinear Regression
BPCA
CK-Means
DUMAP
✗ Incorrect
UMAP is a faster alternative to t-SNE for visualizing embeddings.
Why do we use 2D or 3D plots for embeddings?
ABecause humans can easily understand 2D or 3D visuals
BBecause embeddings only have 2 or 3 dimensions
CBecause 2D plots increase embedding accuracy
DBecause 3D plots remove noise
✗ Incorrect
We reduce embeddings to 2D or 3D so humans can see and understand the data patterns.
Explain how t-SNE helps in understanding word embeddings.
Think about how we can see relationships between words visually.
You got /4 concepts.
Describe one limitation of t-SNE and a possible alternative method.
Consider speed and consistency issues.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of using t-SNE in visualizing word embeddings?
easy
A. To train word embeddings from raw text data
B. To increase the size of word embeddings for better accuracy
C. To reduce high-dimensional word vectors into 2D or 3D for easy visualization
D. To cluster words based on their frequency in the text
Solution
Step 1: Understand t-SNE's role in dimensionality reduction
t-SNE reduces complex, high-dimensional data like word embeddings into 2D or 3D space for visualization.
Step 2: Differentiate from other tasks
It does not train embeddings or cluster by frequency but helps visualize similarity by reducing dimensions.
Final Answer:
To reduce high-dimensional word vectors into 2D or 3D for easy visualization -> Option C
Quick Check:
t-SNE = dimensionality reduction for visualization [OK]
Hint: t-SNE = reduce dimensions to visualize complex data [OK]
Common Mistakes:
Confusing t-SNE with training embeddings
Thinking t-SNE increases data size
Assuming t-SNE clusters by word frequency
2. Which of the following is the correct way to import t-SNE from scikit-learn in Python?
easy
A. from sklearn.manifold import TSNE
B. import sklearn.tsne as TSNE
C. from sklearn.embedding import tSNE
D. import tsne from sklearn
Solution
Step 1: Recall correct module for t-SNE in scikit-learn
t-SNE is in the sklearn.manifold module and is imported as TSNE.
Step 2: Check syntax correctness
from sklearn.manifold import TSNE uses correct syntax: from sklearn.manifold import TSNE. Others are invalid imports.
Final Answer:
from sklearn.manifold import TSNE -> Option A
Quick Check:
Correct import = from sklearn.manifold import TSNE [OK]
Hint: t-SNE is in sklearn.manifold, import as TSNE [OK]
Common Mistakes:
Using wrong module like sklearn.embedding
Incorrect import syntax
Confusing lowercase and uppercase in import
3. Given this Python code snippet using t-SNE, what will be the shape of embeddings_2d?
from sklearn.manifold import TSNE
import numpy as np
embeddings = np.random.rand(100, 50) # 100 words, 50 dimensions
model = TSNE(n_components=2, random_state=42)
embeddings_2d = model.fit_transform(embeddings)
medium
A. (100, 2)
B. (2, 100)
C. (50, 2)
D. (100, 50)
Solution
Step 1: Understand input shape and t-SNE output
Input embeddings have shape (100, 50) meaning 100 samples with 50 features each.
Step 2: Check t-SNE output shape with n_components=2
t-SNE reduces features to 2 dimensions, so output shape is (100, 2) -- 100 samples, 2 features.
Hint: Output shape = (samples, n_components) in t-SNE [OK]
Common Mistakes:
Confusing rows and columns in output shape
Assuming output shape equals input shape
Mixing up n_components with sample count
4. You run t-SNE on word embeddings but get a ValueError: "perplexity must be less than n_samples". What is the likely cause and fix?
medium
A. Input embeddings have wrong shape; reshape to (features, samples)
B. Perplexity is set too high; reduce it below the number of samples
C. Random state is missing; add random_state parameter
D. t-SNE requires normalized data; normalize embeddings first
Solution
Step 1: Understand perplexity parameter in t-SNE
Perplexity controls neighborhood size and must be less than the number of samples.
Step 2: Identify cause of ValueError
Error means perplexity is set equal or larger than sample count, which is invalid.
Step 3: Fix by lowering perplexity
Reduce perplexity to a value smaller than the number of samples to fix the error.
Final Answer:
Perplexity is set too high; reduce it below the number of samples -> Option B
Quick Check:
Perplexity < n_samples to avoid error [OK]
Hint: Keep perplexity less than sample count in t-SNE [OK]
Common Mistakes:
Changing input shape instead of perplexity
Ignoring the perplexity limit
Assuming normalization fixes this error
5. You want to visualize embeddings of 5000 words using t-SNE but notice the plot is very crowded and unclear. Which approach best improves visualization clarity?
hard
A. Apply t-SNE with n_components=50 to keep more dimensions
B. Increase perplexity to a very high value like 1000 to spread points out
C. Use raw high-dimensional embeddings without dimensionality reduction
D. Reduce the number of words by selecting a smaller subset before applying t-SNE
Solution
Step 1: Understand t-SNE limitations with large datasets
t-SNE works best with small to medium data; large sets cause crowded plots and slow computation.
Step 2: Choose practical solution for clarity
Reducing the dataset size by selecting fewer words improves plot clarity and speed.
Step 3: Evaluate other options
Increasing perplexity too high or keeping many dimensions defeats t-SNE's purpose; raw embeddings are hard to visualize.
Final Answer:
Reduce the number of words by selecting a smaller subset before applying t-SNE -> Option D
Quick Check:
Smaller data = clearer t-SNE plots [OK]
Hint: Use smaller data subsets for clearer t-SNE plots [OK]