Bird
Raised Fist0
NlpHow-ToBeginner · 4 min read

How to Visualize Word Embeddings in NLP Easily

To visualize word embeddings in NLP, you can reduce their high-dimensional vectors to 2D or 3D using techniques like t-SNE or PCA. Then, plot these reduced vectors with libraries such as matplotlib or seaborn to see how words group by meaning.
📐

Syntax

Here is the basic syntax to visualize word embeddings:

  • Use model.wv[word] to get the vector for a word from a trained embedding model.
  • Apply TSNE(n_components=2) or PCA(n_components=2) to reduce dimensions.
  • Plot the 2D points with matplotlib.pyplot.scatter().
  • Add labels with plt.text() to identify words on the plot.
python
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Example vectors: list of word vectors
vectors = [model.wv[word] for word in words]

# Reduce dimensions
tsne = TSNE(n_components=2, random_state=0)
reduced_vectors = tsne.fit_transform(vectors)

# Plot
plt.scatter(reduced_vectors[:,0], reduced_vectors[:,1])
for i, word in enumerate(words):
    plt.text(reduced_vectors[i,0], reduced_vectors[i,1], word)
plt.show()
💻

Example

This example shows how to visualize word embeddings from a small set of words using gensim Word2Vec, t-SNE, and matplotlib. It trains a simple model, extracts vectors, reduces dimensions, and plots the words.

python
from gensim.models import Word2Vec
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Sample sentences
sentences = [
    ['king', 'queen', 'man', 'woman'],
    ['apple', 'orange', 'fruit', 'banana'],
    ['car', 'bus', 'train', 'bicycle']
]

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, workers=1, seed=42)

# Words to visualize
words = ['king', 'queen', 'man', 'woman', 'apple', 'orange', 'fruit', 'banana', 'car', 'bus', 'train', 'bicycle']

# Get vectors
vectors = [model.wv[word] for word in words]

# Reduce dimensions with t-SNE
tsne = TSNE(n_components=2, random_state=42)
reduced = tsne.fit_transform(vectors)

# Plot
plt.figure(figsize=(8,6))
plt.scatter(reduced[:,0], reduced[:,1])
for i, word in enumerate(words):
    plt.text(reduced[i,0]+0.01, reduced[i,1]+0.01, word)
plt.title('Word Embeddings Visualization with t-SNE')
plt.show()
Output
A scatter plot window showing 12 words positioned in 2D space, grouped by semantic similarity (e.g., king and queen close, apple and orange close).
⚠️

Common Pitfalls

Common mistakes when visualizing word embeddings:

  • Using raw high-dimensional vectors without reducing dimensions causes plots to be unreadable.
  • Not setting a random seed in t-SNE leads to different plots each run, making comparisons hard.
  • Plotting too many words at once can clutter the visualization and make labels overlap.
  • Ignoring the scale and axes labels can confuse interpretation.
python
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Wrong: No dimension reduction
# vectors = [model.wv[word] for word in words]
# plt.scatter(vectors)  # This will error or produce meaningless plot

# Right: Use t-SNE with fixed random state
vectors = [model.wv[word] for word in words]
tsne = TSNE(n_components=2, random_state=42)
reduced = tsne.fit_transform(vectors)
plt.scatter(reduced[:,0], reduced[:,1])
plt.show()
📊

Quick Reference

Tips for effective word embedding visualization:

  • Always reduce dimensions with t-SNE or PCA.
  • Use a fixed random_state for reproducible plots.
  • Limit the number of words to avoid clutter.
  • Label points clearly with plt.text().
  • Use colors or groups to highlight semantic clusters.

Key Takeaways

Use t-SNE or PCA to reduce word embedding dimensions before plotting.
Plot embeddings with matplotlib and label words for clear visualization.
Set random_state in t-SNE for consistent, reproducible plots.
Avoid plotting too many words at once to keep the plot readable.
Visualizing embeddings helps understand word relationships and clusters.