For word similarity and analogies, we want to measure how close the model's word pairs or analogies are to human judgment or known relationships. Common metrics include cosine similarity for word pairs and accuracy for analogy tasks. Cosine similarity measures how similar two word vectors are by looking at the angle between them, which tells us if words are related in meaning. For analogies, accuracy shows how often the model correctly predicts the missing word in "A is to B as C is to ?" problems. These metrics matter because they directly reflect how well the model understands word meanings and relationships, which is the goal of these tasks.
Word similarity and analogies in NLP - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
For analogy tasks, we can use a simple accuracy count since it is a classification problem:
Total analogies tested: 1000
Correct predictions (True Positives, TP): 850
Incorrect predictions (False Positives + False Negatives): 150
Accuracy = TP / Total = 850 / 1000 = 0.85 (85%)
For word similarity, we often compare model scores to human scores using correlation (like Pearson or Spearman), not a confusion matrix. For example:
Human similarity scores: [0.9, 0.7, 0.2, 0.4]
Model cosine similarities: [0.88, 0.65, 0.25, 0.45]
Correlation coefficient (Pearson) = 0.95 (high agreement)
In word similarity and analogies, precision and recall are less common metrics because these tasks are not typical binary classification. However, if we treat analogy prediction as classification, we can think about tradeoffs:
- High precision: When the model predicts an analogy, it is usually correct. This means fewer wrong answers but might miss some correct analogies (low recall).
- High recall: The model tries to predict many analogies, catching most correct ones but also making more mistakes (lower precision).
Example: A language learning app uses analogy tasks to test vocabulary. High precision means the app rarely gives wrong answers, so learners trust it. High recall means the app covers many analogy types but might confuse learners with some wrong answers. Balancing these depends on the app's goal.
Word similarity:
- Good: Correlation with human scores above 0.8 means the model's similarity matches human intuition well.
- Bad: Correlation below 0.5 means the model's similarity scores do not align well with human judgments.
Analogies:
- Good: Accuracy above 80% means the model correctly solves most analogy questions.
- Bad: Accuracy below 50% means the model guesses poorly and does not understand word relationships well.
- Ignoring context: Word similarity can change with context, but static metrics may miss this, leading to misleading scores.
- Overfitting to test sets: Models tuned too much on standard analogy datasets may perform well there but poorly in real use.
- Accuracy paradox: High accuracy on analogy tasks with many easy questions may hide poor performance on harder cases.
- Data leakage: If analogy test data overlaps with training data, metrics will be unrealistically high.
Your word analogy model has 98% accuracy on a small, easy test set but only 60% on a larger, diverse set. Is it good for production? Why or why not?
Answer: No, it is not good for production. The high accuracy on the small set likely means the model learned those specific examples (overfitting). The lower accuracy on the diverse set shows it struggles with real-world cases. Production models need consistent performance on varied data.
Practice
Solution
Step 1: Understand the concept of word similarity
Word similarity measures how close two words are in meaning, often represented by a number like cosine similarity.Step 2: Differentiate from other word properties
Frequency or letter count does not capture meaning closeness, so those options are incorrect.Final Answer:
How close two words are in meaning using numbers -> Option AQuick Check:
Word similarity = meaning closeness [OK]
- Confusing similarity with word frequency
- Thinking similarity is about word length
- Assuming similarity counts shared letters
vec1 and vec2 in Python using NumPy?Solution
Step 1: Recall cosine similarity formula
Cosine similarity = dot product of vectors divided by product of their norms.Step 2: Match formula to code
np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2)) matches the formula exactly using np.dot and np.linalg.norm.Final Answer:
np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2)) -> Option AQuick Check:
Cosine similarity = dot / (norm1 * norm2) [OK]
- Adding norms instead of multiplying
- Subtracting norms from dot product
- Multiplying dot product by sum of norms
king = [0.5, 0.8, 0.3] queen = [0.45, 0.75, 0.35] man = [0.6, 0.7, 0.2] woman = [0.55, 0.65, 0.25]
What is the closest word to the vector
king - man + woman?Solution
Step 1: Calculate the vector for king - man + woman
Subtract man from king: [0.5-0.6, 0.8-0.7, 0.3-0.2] = [-0.1, 0.1, 0.1]. Add woman: [-0.1+0.55, 0.1+0.65, 0.1+0.25] = [0.45, 0.75, 0.35].Step 2: Compare result to known vectors
The resulting vector matches queen exactly: [0.45, 0.75, 0.35].Final Answer:
queen -> Option CQuick Check:
king - man + woman = queen [OK]
- Not subtracting man vector before adding woman
- Mixing up vector addition order
- Choosing original words instead of analogy result
king - man + woman but has a flaw:import numpy as np
words = {'king': np.array([0.5, 0.8, 0.3]), 'queen': np.array([0.45, 0.75, 0.35]), 'man': np.array([0.6, 0.7, 0.2]), 'woman': np.array([0.55, 0.65, 0.25])}
result = words['king'] - words['man'] + words['woman']
max_word = None
max_sim = -1
for word, vec in words.items():
sim = np.dot(result, vec) / (np.linalg.norm(result) * np.linalg.norm(vec))
if sim > max_sim:
max_word = word
print(max_word)What is the main flaw?
Solution
Step 1: Analyze the similarity search loop
The loop compares the result vector to all words including 'king', 'man', and 'woman' which are part of the calculation.Step 2: Understand why this is problematic
Including original words can cause the highest similarity to be the input words themselves, which is usually unwanted and can cause misleading results.Final Answer:
The code does not exclude the original words from similarity search -> Option DQuick Check:
Exclude input words to avoid bias [OK]
- Assuming zero division error without checking norms
- Thinking max_sim initialization causes error
- Ignoring normalization in dot product
Paris is to France as Tokyo is to ? Using pre-trained word vectors, which approach is best to find the answer?Solution
Step 1: Understand analogy vector arithmetic
Analogies use the formula: word2 - word1 + word3 to find the missing word. Here, Paris is word1, France is word2, Tokyo is word3.Step 2: Apply formula to this analogy
Calculate Tokyo - Paris + France to get the vector representing the answer.Final Answer:
Calculate vector: Tokyo - Paris + France, then find closest word vector -> Option BQuick Check:
Analogy vector = word3 - word1 + word2 [OK]
- Swapping order of subtraction and addition
- Adding all vectors without subtraction
- Using wrong words in formula
