Text embedding models turn words or sentences into numbers so computers can understand them. To check how good these numbers are, we use cosine similarity or distance metrics. These tell us if similar texts have close embeddings and different texts are far apart. For tasks like search or recommendation, precision@k and recall@k show how well the model finds relevant items among top results.
Text embedding models in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Text embedding models usually don't use confusion matrices directly because they output vectors, not class labels. Instead, we look at similarity scores. Here is a simple example of similarity scores for 3 pairs:
Pair | Similarity Score
-----------------|-----------------
"cat" vs "dog" | 0.85 (high, related)
"cat" vs "car" | 0.30 (low, unrelated)
"dog" vs "wolf"| 0.90 (very high, related)
High scores mean embeddings are close, showing the model understands meaning well.
Imagine a search engine using embeddings. If it shows only very few results (high precision), it might miss some good answers (low recall). If it shows many results (high recall), some might be less relevant (low precision). For example:
- High precision, low recall: Only top 3 very close matches shown, but misses other good ones.
- High recall, low precision: Shows 20 results including many not related.
Balancing precision and recall depends on what users want: very accurate few results or more complete but less precise results.
Good embedding models have:
- High cosine similarity (close to 1.0) for related texts.
- Low cosine similarity (close to 0 or negative) for unrelated texts.
- Precision@10 above 0.7 means most top 10 results are relevant.
- Recall@10 above 0.6 means it finds most relevant items in top 10.
Bad models show similar scores for unrelated texts or low precision and recall, meaning embeddings do not capture meaning well.
- Using accuracy: Accuracy is not useful because embeddings are vectors, not classes.
- Ignoring data diversity: Testing only on similar texts can hide poor performance on different topics.
- Overfitting: Model may memorize training pairs, showing high similarity only on known data.
- Data leakage: If test texts appear in training, metrics look better but model is not truly generalizing.
- Ignoring metric choice: Using Euclidean distance instead of cosine similarity can give misleading results.
Your text embedding model shows cosine similarity 0.95 for unrelated texts and 0.60 for related texts. Is it good? Why or why not?
Answer: No, it is not good. Related texts should have higher similarity than unrelated ones. Here, unrelated texts have higher similarity (0.95) than related (0.60), so the model fails to capture meaning properly.
Practice
text embedding model?Solution
Step 1: Understand what text embedding models do
Text embedding models turn words or sentences into number arrays that represent their meaning.Step 2: Compare options with this understanding
Only To convert text into numbers that capture its meaning describes converting text into meaningful numbers. Other options describe different tasks.Final Answer:
To convert text into numbers that capture its meaning -> Option AQuick Check:
Text embedding = convert text to meaningful numbers [OK]
- Confusing embeddings with translation
- Thinking embeddings generate images
- Assuming embeddings just count words
get_embedding(text)?Solution
Step 1: Recall Python function call syntax
In Python, functions are called with parentheses and arguments inside, likefunc(arg).Step 2: Match syntax with options
Only embedding = get_embedding(text) uses parentheses correctly. Options A, B, and C use invalid syntax for function calls.Final Answer:
embedding = get_embedding(text) -> Option DQuick Check:
Function call uses parentheses () [OK]
- Using square brackets [] instead of parentheses
- Using curly braces {} instead of parentheses
- Using arrow -> instead of parentheses
def dummy_embedding(text):
return [len(text), sum(ord(c) for c in text) % 100]
result = dummy_embedding('cat')
print(result)Solution
Step 1: Calculate length of 'cat'
The word 'cat' has 3 characters, so first element is 3.Step 2: Calculate sum of ASCII codes modulo 100
ord('c')=99, ord('a')=97, ord('t')=116; sum=99+97+116=312; 312 % 100 = 12.Step 3: Determine output
return [3, 12], so print([3, 12]).Final Answer:
[3, 12] -> Option AQuick Check:
len('cat')=3, (99+97+116)%100=12 [OK]
- Wrong ASCII sum calculation
- Miscounting string length
- Mixing uppercase and lowercase ASCII codes
def get_embedding(text):
return [len(text)]
texts = ['hello', 'world']
embeddings = []
for t in texts:
embeddings.append(get_embedding)
print(embeddings)Solution
Step 1: Check the loop appending embeddings
The code appendsget_embeddingwithout parentheses, so it adds the function object, not the result.Step 2: Understand the problem
Appending the function itself causes the list to hold function references, not embedding lists like [5] and [5].Final Answer:
The function is not called; it appends the function itself -> Option BQuick Check:
Missing () calls function, else appends function object [OK]
- Forgetting parentheses to call function
- Assuming list is empty causes error
- Thinking variable is undefined
Solution
Step 1: Understand similarity with embeddings
Embeddings turn sentences into number arrays capturing meaning, so comparing distances between embeddings finds similar sentences.Step 2: Evaluate options for similarity search
Compute embeddings for all sentences, then find the one with smallest distance to 'I love apples' embedding uses embeddings and distance, which is the correct method. Options A, C, and D do not use embeddings or meaningful similarity measures.Final Answer:
Compute embeddings for all sentences, then find the one with smallest distance to 'I love apples' embedding -> Option CQuick Check:
Use embeddings + distance for similarity [OK]
- Using word count instead of embeddings
- Ignoring embeddings for similarity
- Random selection instead of comparison
