Challenge - 5 Problems
Embedding Visualization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate1:30remaining
Output of t-SNE embedding shape
Given the following code snippet that applies t-SNE on a set of 100 word embeddings each of dimension 50, what is the shape of the output embedding array?
NLP
from sklearn.manifold import TSNE import numpy as np embeddings = np.random.rand(100, 50) tsne = TSNE(n_components=2, random_state=42) reduced_embeddings = tsne.fit_transform(embeddings) print(reduced_embeddings.shape)
Attempts:
2 left
💡 Hint
t-SNE reduces the dimensionality to the number of components specified.
✗ Incorrect
t-SNE transforms the original embeddings from their original dimension (50) to the specified number of components (2), keeping the number of samples (100) the same. So the output shape is (100, 2).
🧠 Conceptual
intermediate1:00remaining
Purpose of perplexity in t-SNE
What does the 'perplexity' parameter control in the t-SNE algorithm when visualizing embeddings?
Attempts:
2 left
💡 Hint
Think about how t-SNE balances local and global aspects of data.
✗ Incorrect
Perplexity in t-SNE roughly corresponds to the number of nearest neighbors that influence the embedding of each point. It balances attention between local and global data structure.
❓ Metrics
advanced1:30remaining
Evaluating t-SNE visualization quality
Which metric is commonly used to evaluate how well a t-SNE visualization preserves the local structure of high-dimensional data?
Attempts:
2 left
💡 Hint
t-SNE minimizes a specific divergence during training.
✗ Incorrect
t-SNE minimizes the Kullback-Leibler divergence between probability distributions in high and low dimensions to preserve local structure.
🔧 Debug
advanced2:00remaining
Identifying error in t-SNE usage
What error will the following code raise when trying to visualize embeddings with t-SNE?
NLP
from sklearn.manifold import TSNE import numpy as np embeddings = np.random.rand(50, 100) tsne = TSNE(n_components=3) reduced = tsne.fit_transform(embeddings) print(reduced.shape)
Attempts:
2 left
💡 Hint
Check the relationship between n_components and input dimension.
✗ Incorrect
The code runs without error. The embeddings have shape (50, 100), meaning 50 samples and 100 features. n_components=3 is less than 100, which is valid. The output shape is (50, 3).
❓ Model Choice
expert2:30remaining
Choosing dimensionality reduction for large NLP embeddings
You have 1 million word embeddings of dimension 300 and want to visualize them in 2D. Which dimensionality reduction technique is most suitable considering both speed and quality?
Attempts:
2 left
💡 Hint
Consider scalability and preservation of local/global structure.
✗ Incorrect
UMAP scales better than t-SNE on large datasets and preserves both local and global structure well, making it suitable for large NLP embeddings visualization.