Bird
Raised Fist0
NLPml~10 mins

Visualizing embeddings (t-SNE) in NLP - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the t-SNE class from sklearn.manifold.

NLP
from sklearn.manifold import [1]
Drag options to blanks, or click blank then click option'
ATSNE
BtSNE
Ct_SNE
DTsne
Attempts:
3 left
💡 Hint
Common Mistakes
Using lowercase letters like 'tSNE' or 'Tsne' causes import errors.
Adding underscores like 't_SNE' is incorrect.
2fill in blank
medium

Complete the code to create a t-SNE object with 2 components for visualization.

NLP
tsne = TSNE(n_components=[1], random_state=42)
Drag options to blanks, or click blank then click option'
A3
B5
C1
D2
Attempts:
3 left
💡 Hint
Common Mistakes
Setting n_components to 3 or more makes visualization harder.
Using 1 component loses too much information.
3fill in blank
hard

Fix the error in the code to fit and transform the embeddings using t-SNE.

NLP
embeddings_2d = tsne.[1](embeddings)
Drag options to blanks, or click blank then click option'
Afit
Bfit_predict
Cfit_transform
Dtransform
Attempts:
3 left
💡 Hint
Common Mistakes
Using only fit does not return transformed data.
Using transform without fitting causes errors.
4fill in blank
hard

Fill both blanks to create a scatter plot of the 2D embeddings with labels.

NLP
plt.scatter(embeddings_2d[:, [1]], embeddings_2d[:, [2]], c=labels, cmap='viridis')
plt.title('t-SNE visualization')
plt.show()
Drag options to blanks, or click blank then click option'
A0
B1
C2
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Using indices 2 or 3 causes index errors.
Swapping x and y axes can confuse interpretation.
5fill in blank
hard

Fill the blanks to create a dictionary of word embeddings filtered by length and visualize with t-SNE.

NLP
filtered_embeddings = {word: embedding for word, embedding in all_embeddings.items() if len(word) [1] 5}
tsne = TSNE(n_components=[2], random_state=42)
embeddings_2d = tsne.fit_transform(list(filtered_embeddings.values()))
Drag options to blanks, or click blank then click option'
A<
B>
C2
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Using > 5 filters out short words instead of keeping them.
Setting n_components to 3 makes visualization 3D, which is harder to plot.

Practice

(1/5)
1. What is the main purpose of using t-SNE in visualizing word embeddings?
easy
A. To train word embeddings from raw text data
B. To increase the size of word embeddings for better accuracy
C. To reduce high-dimensional word vectors into 2D or 3D for easy visualization
D. To cluster words based on their frequency in the text

Solution

  1. Step 1: Understand t-SNE's role in dimensionality reduction

    t-SNE reduces complex, high-dimensional data like word embeddings into 2D or 3D space for visualization.
  2. Step 2: Differentiate from other tasks

    It does not train embeddings or cluster by frequency but helps visualize similarity by reducing dimensions.
  3. Final Answer:

    To reduce high-dimensional word vectors into 2D or 3D for easy visualization -> Option C
  4. Quick Check:

    t-SNE = dimensionality reduction for visualization [OK]
Hint: t-SNE = reduce dimensions to visualize complex data [OK]
Common Mistakes:
  • Confusing t-SNE with training embeddings
  • Thinking t-SNE increases data size
  • Assuming t-SNE clusters by word frequency
2. Which of the following is the correct way to import t-SNE from scikit-learn in Python?
easy
A. from sklearn.manifold import TSNE
B. import sklearn.tsne as TSNE
C. from sklearn.embedding import tSNE
D. import tsne from sklearn

Solution

  1. Step 1: Recall correct module for t-SNE in scikit-learn

    t-SNE is in the sklearn.manifold module and is imported as TSNE.
  2. Step 2: Check syntax correctness

    from sklearn.manifold import TSNE uses correct syntax: from sklearn.manifold import TSNE. Others are invalid imports.
  3. Final Answer:

    from sklearn.manifold import TSNE -> Option A
  4. Quick Check:

    Correct import = from sklearn.manifold import TSNE [OK]
Hint: t-SNE is in sklearn.manifold, import as TSNE [OK]
Common Mistakes:
  • Using wrong module like sklearn.embedding
  • Incorrect import syntax
  • Confusing lowercase and uppercase in import
3. Given this Python code snippet using t-SNE, what will be the shape of embeddings_2d?
from sklearn.manifold import TSNE
import numpy as np

embeddings = np.random.rand(100, 50)  # 100 words, 50 dimensions
model = TSNE(n_components=2, random_state=42)
embeddings_2d = model.fit_transform(embeddings)
medium
A. (100, 2)
B. (2, 100)
C. (50, 2)
D. (100, 50)

Solution

  1. Step 1: Understand input shape and t-SNE output

    Input embeddings have shape (100, 50) meaning 100 samples with 50 features each.
  2. Step 2: Check t-SNE output shape with n_components=2

    t-SNE reduces features to 2 dimensions, so output shape is (100, 2) -- 100 samples, 2 features.
  3. Final Answer:

    (100, 2) -> Option A
  4. Quick Check:

    Output shape = (samples, n_components) = (100, 2) [OK]
Hint: Output shape = (samples, n_components) in t-SNE [OK]
Common Mistakes:
  • Confusing rows and columns in output shape
  • Assuming output shape equals input shape
  • Mixing up n_components with sample count
4. You run t-SNE on word embeddings but get a ValueError: "perplexity must be less than n_samples". What is the likely cause and fix?
medium
A. Input embeddings have wrong shape; reshape to (features, samples)
B. Perplexity is set too high; reduce it below the number of samples
C. Random state is missing; add random_state parameter
D. t-SNE requires normalized data; normalize embeddings first

Solution

  1. Step 1: Understand perplexity parameter in t-SNE

    Perplexity controls neighborhood size and must be less than the number of samples.
  2. Step 2: Identify cause of ValueError

    Error means perplexity is set equal or larger than sample count, which is invalid.
  3. Step 3: Fix by lowering perplexity

    Reduce perplexity to a value smaller than the number of samples to fix the error.
  4. Final Answer:

    Perplexity is set too high; reduce it below the number of samples -> Option B
  5. Quick Check:

    Perplexity < n_samples to avoid error [OK]
Hint: Keep perplexity less than sample count in t-SNE [OK]
Common Mistakes:
  • Changing input shape instead of perplexity
  • Ignoring the perplexity limit
  • Assuming normalization fixes this error
5. You want to visualize embeddings of 5000 words using t-SNE but notice the plot is very crowded and unclear. Which approach best improves visualization clarity?
hard
A. Apply t-SNE with n_components=50 to keep more dimensions
B. Increase perplexity to a very high value like 1000 to spread points out
C. Use raw high-dimensional embeddings without dimensionality reduction
D. Reduce the number of words by selecting a smaller subset before applying t-SNE

Solution

  1. Step 1: Understand t-SNE limitations with large datasets

    t-SNE works best with small to medium data; large sets cause crowded plots and slow computation.
  2. Step 2: Choose practical solution for clarity

    Reducing the dataset size by selecting fewer words improves plot clarity and speed.
  3. Step 3: Evaluate other options

    Increasing perplexity too high or keeping many dimensions defeats t-SNE's purpose; raw embeddings are hard to visualize.
  4. Final Answer:

    Reduce the number of words by selecting a smaller subset before applying t-SNE -> Option D
  5. Quick Check:

    Smaller data = clearer t-SNE plots [OK]
Hint: Use smaller data subsets for clearer t-SNE plots [OK]
Common Mistakes:
  • Setting perplexity too high
  • Using too many dimensions in t-SNE
  • Trying to visualize raw embeddings directly