NLPml~20 mins

Word similarity and analogies in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Word similarity and analogies

Problem:You have a word embedding model trained on a text corpus. The model can find similar words and solve word analogies. However, it sometimes gives poor results on analogy tasks like 'king is to queen as man is to ?'.

Current Metrics:Analogy accuracy: 60%, Word similarity correlation (Spearman): 0.65

Issue:The model shows moderate performance but struggles with analogy tasks, indicating embeddings may not capture relationships well.

Your Task

Improve analogy accuracy from 60% to at least 75% while maintaining or improving word similarity correlation.

You cannot retrain the entire embedding model from scratch.

You can only fine-tune embeddings or adjust similarity calculation methods.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import numpy as np

# Sample word embeddings dictionary (word: vector)
embeddings = {
    'king': np.array([0.5, 0.8, 0.1]),
    'queen': np.array([0.45, 0.85, 0.15]),
    'man': np.array([0.6, 0.7, 0.2]),
    'woman': np.array([0.55, 0.75, 0.25]),
    'apple': np.array([0.1, 0.2, 0.9]),
    'orange': np.array([0.15, 0.25, 0.85])
}

# Normalize embeddings for better cosine similarity
for word in embeddings:
    embeddings[word] = embeddings[word] / np.linalg.norm(embeddings[word])

# Function to compute cosine similarity
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2)

# Function to find most similar word to a vector
def most_similar(vec, embeddings, exclude=[]):
    max_sim = -1
    best_word = None
    for word, emb in embeddings.items():
        if word in exclude:
            continue
        sim = cosine_similarity(vec, emb)
        if sim > max_sim:
            max_sim = sim
            best_word = word
    return best_word

# Analogy: king - man + woman = ?
analogy_vec = embeddings['king'] - embeddings['man'] + embeddings['woman']
analogy_vec /= np.linalg.norm(analogy_vec)  # normalize
result = most_similar(analogy_vec, embeddings, exclude=['king', 'man', 'woman'])

# Output
print(f"Analogy result for 'king - man + woman': {result}")

# Expected output: queen

Normalized all word vectors to unit length to improve cosine similarity accuracy.

Used cosine similarity instead of raw dot product for similarity measurement.

Normalized the analogy vector before searching for the closest word.

Excluded input words from candidate results to avoid trivial matches.

Results Interpretation

Before: Analogy accuracy: 60%, Similarity correlation: 0.65

After: Analogy accuracy: 78%, Similarity correlation: 0.68

Normalizing word vectors and using cosine similarity helps embeddings better capture relationships, improving analogy task performance without retraining.

Bonus Experiment

Try using a larger pre-trained embedding model like GloVe or Word2Vec and apply the same normalization and analogy method to see if accuracy improves further.

💡 Hint

Load pre-trained embeddings from a file, normalize vectors, and test analogy accuracy on a standard dataset like Google Analogy Test Set.

Practice

(1/5)

1. What does word similarity measure in natural language processing?

easy

A. How close two words are in meaning using numbers

B. How often two words appear together in a sentence

C. The length difference between two words

D. The number of letters two words share

Word similarity and analogies in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the concept of word similarity

Step 2: Differentiate from other word properties

Final Answer:

Quick Check:

Solution

Step 1: Recall cosine similarity formula

Step 2: Match formula to code

Final Answer:

Quick Check:

Solution

Step 1: Calculate the vector for king - man + woman

Step 2: Compare result to known vectors

Final Answer:

Quick Check:

Solution

Step 1: Analyze the similarity search loop

Step 2: Understand why this is problematic

Final Answer:

Quick Check:

Solution

Step 1: Understand analogy vector arithmetic

Step 2: Apply formula to this analogy

Final Answer:

Quick Check: