When we talk about embeddings capturing semantic meaning, the key metric is cosine similarity. This metric measures how close two vectors are in direction, regardless of their length. Since embeddings are vectors representing words or sentences, cosine similarity tells us how similar their meanings are. A higher cosine similarity means the embeddings share more semantic meaning.
Why embeddings capture semantic meaning in Prompt Engineering / GenAI - Why Metrics Matter
Example: Comparing embeddings of words "cat", "dog", and "car" using cosine similarity
cat dog car
cat 1.00 0.85 0.10
dog 0.85 1.00 0.12
car 0.10 0.12 1.00
Here, "cat" and "dog" have high similarity (0.85), showing semantic closeness.
"cat" and "car" have low similarity (0.10), showing different meanings.In semantic search or recommendation systems using embeddings, precision means how many of the retrieved items are truly relevant (semantically close). Recall means how many of all relevant items were found.
For example, if you search for "apple" meaning the fruit, high precision means most results are about fruit, not the company. High recall means you find most fruit-related items.
Sometimes increasing recall (finding more related items) lowers precision (some unrelated items appear). Balancing these depends on the application.
A good embedding model will have:
- High cosine similarity (close to 1) for semantically similar words or sentences.
- Low cosine similarity (close to 0 or negative) for unrelated meanings.
A bad model might show high similarity for unrelated words, confusing meanings, or low similarity for synonyms.
- Ignoring vector length: Using Euclidean distance instead of cosine similarity can mislead semantic closeness.
- Overfitting embeddings: Embeddings trained on small data may memorize instead of generalizing meaning.
- Data leakage: If test words appear in training, similarity scores may be artificially high.
- Ignoring context: Static embeddings ignore word meaning changes in sentences, lowering real semantic capture.
Your embedding model shows cosine similarity of 0.95 between "bank" (financial) and "river". Is this good? Why or why not?
Answer: No, this is not good. "Bank" and "river" have different meanings here. High similarity means the model confuses meanings and does not capture semantic differences well.