How to Visualize Word Embeddings in NLP Easily
To visualize
word embeddings in NLP, you can reduce their high-dimensional vectors to 2D or 3D using techniques like t-SNE or PCA. Then, plot these reduced vectors with libraries such as matplotlib or seaborn to see how words group by meaning.Syntax
Here is the basic syntax to visualize word embeddings:
- Use
model.wv[word]to get the vector for a word from a trained embedding model. - Apply
TSNE(n_components=2)orPCA(n_components=2)to reduce dimensions. - Plot the 2D points with
matplotlib.pyplot.scatter(). - Add labels with
plt.text()to identify words on the plot.
python
from sklearn.manifold import TSNE import matplotlib.pyplot as plt # Example vectors: list of word vectors vectors = [model.wv[word] for word in words] # Reduce dimensions tsne = TSNE(n_components=2, random_state=0) reduced_vectors = tsne.fit_transform(vectors) # Plot plt.scatter(reduced_vectors[:,0], reduced_vectors[:,1]) for i, word in enumerate(words): plt.text(reduced_vectors[i,0], reduced_vectors[i,1], word) plt.show()
Example
This example shows how to visualize word embeddings from a small set of words using gensim Word2Vec, t-SNE, and matplotlib. It trains a simple model, extracts vectors, reduces dimensions, and plots the words.
python
from gensim.models import Word2Vec from sklearn.manifold import TSNE import matplotlib.pyplot as plt # Sample sentences sentences = [ ['king', 'queen', 'man', 'woman'], ['apple', 'orange', 'fruit', 'banana'], ['car', 'bus', 'train', 'bicycle'] ] # Train Word2Vec model model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, workers=1, seed=42) # Words to visualize words = ['king', 'queen', 'man', 'woman', 'apple', 'orange', 'fruit', 'banana', 'car', 'bus', 'train', 'bicycle'] # Get vectors vectors = [model.wv[word] for word in words] # Reduce dimensions with t-SNE tsne = TSNE(n_components=2, random_state=42) reduced = tsne.fit_transform(vectors) # Plot plt.figure(figsize=(8,6)) plt.scatter(reduced[:,0], reduced[:,1]) for i, word in enumerate(words): plt.text(reduced[i,0]+0.01, reduced[i,1]+0.01, word) plt.title('Word Embeddings Visualization with t-SNE') plt.show()
Output
A scatter plot window showing 12 words positioned in 2D space, grouped by semantic similarity (e.g., king and queen close, apple and orange close).
Common Pitfalls
Common mistakes when visualizing word embeddings:
- Using raw high-dimensional vectors without reducing dimensions causes plots to be unreadable.
- Not setting a random seed in
t-SNEleads to different plots each run, making comparisons hard. - Plotting too many words at once can clutter the visualization and make labels overlap.
- Ignoring the scale and axes labels can confuse interpretation.
python
from sklearn.manifold import TSNE import matplotlib.pyplot as plt # Wrong: No dimension reduction # vectors = [model.wv[word] for word in words] # plt.scatter(vectors) # This will error or produce meaningless plot # Right: Use t-SNE with fixed random state vectors = [model.wv[word] for word in words] tsne = TSNE(n_components=2, random_state=42) reduced = tsne.fit_transform(vectors) plt.scatter(reduced[:,0], reduced[:,1]) plt.show()
Quick Reference
Tips for effective word embedding visualization:
- Always reduce dimensions with
t-SNEorPCA. - Use a fixed
random_statefor reproducible plots. - Limit the number of words to avoid clutter.
- Label points clearly with
plt.text(). - Use colors or groups to highlight semantic clusters.
Key Takeaways
Use t-SNE or PCA to reduce word embedding dimensions before plotting.
Plot embeddings with matplotlib and label words for clear visualization.
Set random_state in t-SNE for consistent, reproducible plots.
Avoid plotting too many words at once to keep the plot readable.
Visualizing embeddings helps understand word relationships and clusters.
