When evaluating embeddings that capture semantic meaning, metrics like cosine similarity and Euclidean distance matter most. These metrics measure how close or similar two word or sentence vectors are in space. A smaller distance or higher cosine similarity means the embeddings represent similar meanings. This helps us check if the model understands relationships between words or sentences.
Why embeddings capture semantic meaning in NLP - Why Metrics Matter
Start learning this pattern below
Jump into concepts and practice - no test required
Embedding similarity matrix example (cosine similarity):
cat dog apple car
cat 1.00 0.85 0.10 0.20
dog 0.85 1.00 0.05 0.15
apple 0.10 0.05 1.00 0.30
car 0.20 0.15 0.30 1.00
High values (close to 1) between 'cat' and 'dog' show semantic closeness.
Low values between 'cat' and 'apple' show semantic difference.For embeddings, the tradeoff is between semantic precision and semantic recall.
- Semantic Precision: How often the closest embeddings truly mean the same or similar things. High precision means few false matches.
- Semantic Recall: How many true semantic matches the embeddings find among all possible matches. High recall means few misses.
Example: In a search engine, high semantic precision means the top results are very relevant. High semantic recall means the engine finds most relevant results, even if some are less precise.
Good embedding metrics:
- Cosine similarity close to 1 for synonyms or related words (e.g., "car" and "automobile" > 0.8)
- Cosine similarity close to 0 or negative for unrelated words (e.g., "car" and "banana" < 0.2)
- Consistent distances that reflect known semantic relationships
Bad embedding metrics:
- High similarity between unrelated words (false positives)
- Low similarity between synonyms or related words (false negatives)
- Random or noisy similarity scores that do not reflect meaning
- Accuracy paradox: Using simple accuracy on classification of embeddings can be misleading because semantic similarity is continuous, not binary.
- Data leakage: If embeddings are trained on test data, similarity scores will be unrealistically high.
- Overfitting: Embeddings that memorize training pairs may show perfect similarity on training but fail on new words.
- Ignoring context: Static embeddings may fail to capture meaning changes in different sentences.
This question is about fraud detection, not embeddings, but it teaches an important lesson.
Even with 98% accuracy, 12% recall means the model misses 88% of fraud cases. This is bad because catching fraud is critical. High recall is more important here.
Similarly, for embeddings, a metric must match the goal. High similarity scores alone don't guarantee good semantic understanding if many true matches are missed.
Practice
Solution
Step 1: Understand what embeddings do
Embeddings convert words into numbers (vectors) that represent their meanings.Step 2: Recognize the benefit for computers
These numbers help computers see which words are similar in meaning by their closeness in vector space.Final Answer:
Because they turn words into numbers that show their meaning -> Option AQuick Check:
Embeddings = numeric meaning representation [OK]
- Thinking embeddings translate languages
- Confusing embeddings with word frequency counts
- Believing embeddings remove words
Solution
Step 1: Identify the data type for embeddings
Embeddings are numeric vectors, usually lists or arrays of floats.Step 2: Check each option's format
embedding = [0.1, 0.5, -0.3]shows a list of numbers, which is correct. Others are strings, integers, or dictionaries, which are incorrect.Final Answer:
embedding = [0.1, 0.5, -0.3]-> Option DQuick Check:
Embedding vector = list of numbers [OK]
- Using strings instead of numeric vectors
- Using single numbers instead of vectors
- Using dictionaries instead of lists
embedding_cat = [0.2, 0.4, 0.6]embedding_dog = [0.21, 0.39, 0.58]embedding_car = [0.9, 0.1, 0.2]Which pair is most semantically similar based on cosine similarity?
Solution
Step 1: Understand cosine similarity
Cosine similarity measures how close two vectors point in the same direction; higher means more similar.Step 2: Compare vectors
embedding_cat and embedding_dog are close numerically, so their cosine similarity is high. embedding_car is quite different.Final Answer:
cat and dog -> Option CQuick Check:
Closest vectors = most similar words [OK]
- Assuming car is similar to cat or dog
- Thinking all pairs have same similarity
- Ignoring vector closeness
def similarity(vec1, vec2):
return sum(a*b for a, b in zip(vec1, vec2))
embedding1 = [0.3, 0.5, 0.2]
embedding2 = [0.3, 0.5]
print(similarity(embedding1, embedding2))What is the main problem here?
Solution
Step 1: Check vector lengths
embedding1 has 3 elements, embedding2 has 2 elements, so zip stops early, ignoring last element of embedding1.Step 2: Understand impact on similarity
This causes incomplete calculation and inaccurate similarity score.Final Answer:
The vectors have different lengths causing incorrect similarity -> Option AQuick Check:
Vector length mismatch = wrong similarity [OK]
- Ignoring vector length mismatch
- Thinking sum is wrong operation here
- Expecting list output instead of number
Solution
Step 1: Understand sentence embedding from word embeddings
Averaging pretrained word embeddings creates a vector representing the whole sentence's meaning.Step 2: Compare other options
One-hot encoding loses semantic info, random vectors have no meaning, and using only first word misses context.Final Answer:
Use pretrained word embeddings and average their vectors for the whole sentence -> Option BQuick Check:
Average pretrained embeddings = better sentence meaning [OK]
- Using one-hot encoding which lacks meaning
- Using random vectors without training
- Ignoring all words except the first
