Word similarity and analogies help computers understand how words relate to each other, like how 'king' relates to 'queen'. This makes language tasks easier and smarter.
Word similarity and analogies in NLP
from gensim.models import KeyedVectors # Load pre-trained word vectors model = KeyedVectors.load_word2vec_format('path/to/word2vec.bin', binary=True) # Find similarity between two words similarity = model.similarity('word1', 'word2') # Find words similar to a given word similar_words = model.most_similar('word', topn=5) # Solve analogy: word_a is to word_b as word_c is to ? result = model.most_similar(positive=['word_b', 'word_c'], negative=['word_a'], topn=1)
You need pre-trained word vectors like Word2Vec or GloVe to use these methods.
Similarity returns a score between -1 and 1 showing how close two words are.
similarity = model.similarity('cat', 'dog') print(similarity)
similar_words = model.most_similar('king', topn=3) print(similar_words)
result = model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1) print(result)
This program loads a small word vector model, calculates similarity between 'cat' and 'dog', finds words similar to 'king', and solves a simple analogy.
from gensim.models import KeyedVectors # Load a small pre-trained model for demonstration # Here we use a small subset from gensim-data for quick testing import gensim.downloader as api model = api.load('glove-wiki-gigaword-50') # Calculate similarity between 'cat' and 'dog' similarity = model.similarity('cat', 'dog') print(f"Similarity between 'cat' and 'dog': {similarity:.2f}") # Find top 3 words similar to 'king' similar_words = model.most_similar('king', topn=3) print("Top 3 words similar to 'king':") for word, score in similar_words: print(f"{word}: {score:.2f}") # Solve analogy: man is to king as woman is to ? result = model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1) print(f"'man' is to 'king' as 'woman' is to '{result[0][0]}' with score {result[0][1]:.2f}")
Pre-trained models can be large; using smaller ones helps beginners experiment quickly.
Not all words will be in the model vocabulary; check with 'word in model' before using.
Similarity scores closer to 1 mean very similar; closer to 0 or negative means less related.
Word similarity measures how close two words are in meaning using numbers.
Analogies let us find a word that fits a relationship between other words.
Pre-trained word vectors are needed to do these tasks easily.