0
0
NLPml~20 mins

Visualizing embeddings (t-SNE) in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Visualizing embeddings (t-SNE)
Problem:You have word embeddings from a small text dataset. You want to visualize these embeddings to understand how similar words group together.
Current Metrics:No quantitative metrics; current state is raw embeddings without visualization.
Issue:Without visualization, it's hard to see relationships or clusters in the embeddings.
Your Task
Create a 2D visualization of word embeddings using t-SNE to reveal clusters of similar words.
Use t-SNE for dimensionality reduction.
Use matplotlib for plotting.
Use a small set of word embeddings (e.g., from pretrained GloVe or a small custom set).
Hint 1
Hint 2
Hint 3
Solution
NLP
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# Sample word embeddings (5 words, 50-dimensional vectors simulated)
words = ['cat', 'dog', 'apple', 'orange', 'car']
embeddings = np.array([
    np.random.normal(0, 1, 50) + 1,  # cat
    np.random.normal(0, 1, 50) + 1,  # dog
    np.random.normal(0, 1, 50) - 1,  # apple
    np.random.normal(0, 1, 50) - 1,  # orange
    np.random.normal(0, 1, 50) + 3   # car
])

# Normalize embeddings
embeddings_norm = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)

# Apply t-SNE
tsne = TSNE(n_components=2, random_state=42)
embeddings_2d = tsne.fit_transform(embeddings_norm)

# Plot
plt.figure(figsize=(8, 6))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], color='blue')
for i, word in enumerate(words):
    plt.text(embeddings_2d[i, 0] + 0.01, embeddings_2d[i, 1] + 0.01, word, fontsize=12)
plt.title('t-SNE visualization of word embeddings')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.grid(True)
plt.show()
Created a small set of sample word embeddings with simulated vectors.
Normalized embeddings to unit length for better t-SNE performance.
Applied t-SNE to reduce 50D embeddings to 2D.
Plotted the 2D points with word labels using matplotlib.
Results Interpretation

Before: Raw 50-dimensional embeddings with no visual insight.
After: 2D plot showing word clusters, making relationships visible.

t-SNE helps us see how similar words group together by reducing high-dimensional embeddings to 2D, making patterns easier to understand.
Bonus Experiment
Try visualizing embeddings from a larger set of words and compare t-SNE with PCA for dimensionality reduction.
💡 Hint
Use sklearn's PCA and t-SNE on the same embeddings and plot side-by-side to see differences in cluster separation.