What if your computer could instantly find things that are 'just like' what you want, without you lifting a finger?
Why Vector similarity metrics in Prompt Engineering / GenAI? - Purpose & Use Cases
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine you have hundreds of photos and you want to find which ones look alike by comparing every detail manually.
Manually checking each photo against others is slow, tiring, and easy to make mistakes because our eyes can miss subtle differences or similarities.
Vector similarity metrics turn complex data like images or text into numbers, then quickly measure how close or alike they are, saving time and improving accuracy.
for img1 in photos: for img2 in photos: compare_pixels(img1, img2)
similarity = cosine_similarity(vector1, vector2)
It lets machines quickly find and rank items by how similar they are, unlocking smart search, recommendations, and more.
When you search for a song by humming, vector similarity helps match your tune to the closest songs in the database.
Manual comparison is slow and error-prone.
Vector similarity metrics convert data into numbers for fast comparison.
This enables smart, accurate matching in many applications.
Practice
Solution
Step 1: Understand cosine similarity
Cosine similarity measures the cosine of the angle between two vectors, showing how aligned they are.Step 2: Compare with other metrics
Euclidean and Manhattan distances measure gaps, not angles. Jaccard is for sets, not vectors.Final Answer:
Cosine similarity -> Option CQuick Check:
Angle-based similarity = Cosine similarity [OK]
- Confusing distance with angle measurement
- Thinking Euclidean measures angle
- Mixing set similarity with vector similarity
a and b using numpy?Solution
Step 1: Recall cosine similarity formula
Cosine similarity = dot product of vectors divided by product of their lengths (norms).Step 2: Match formula to code
np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) matches this formula exactly. np.linalg.norm(a - b) is Euclidean distance, C is Manhattan distance, D is incorrect formula.Final Answer:
np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) -> Option DQuick Check:
Dot product over norms = cosine similarity [OK]
- Using subtraction instead of dot product
- Multiplying norms instead of dividing
- Confusing Euclidean with cosine formula
a = np.array([1, 2, 3]) and b = np.array([4, 5, 6]), what is the output of np.linalg.norm(a - b)?Solution
Step 1: Calculate vector difference
a - b = [1-4, 2-5, 3-6] = [-3, -3, -3]Step 2: Compute Euclidean norm
Norm = sqrt((-3)^2 + (-3)^2 + (-3)^2) = sqrt(9+9+9) = sqrt(27) ≈ 5.196Final Answer:
5.196 -> Option BQuick Check:
Euclidean distance = 5.196 [OK]
- Forgetting to square differences
- Calculating sum instead of sqrt of sum
- Mixing up vector subtraction order
import numpy as np
def cosine_sim(a, b):
return np.dot(a, b) / np.linalg.norm(a) + np.linalg.norm(b)
print(cosine_sim(np.array([1, 0]), np.array([0, 1])))Solution
Step 1: Analyze denominator in formula
The code adds norms: np.linalg.norm(a) + np.linalg.norm(b), but cosine similarity divides by their product.Step 2: Understand correct formula
Cosine similarity = dot(a,b) / (norm(a) * norm(b)), so addition is wrong here.Final Answer:
The denominator should multiply norms, not add them -> Option AQuick Check:
Denominator = product of norms [OK]
- Adding norms instead of multiplying
- Using cross product instead of dot product
- Forgetting to return value
doc1 = [1, 0, 2, 1] and doc2 = [0, 1, 1, 1]. Which similarity metric is best to find how similar their topics are, and why?Solution
Step 1: Understand vector meaning in text
Vectors represent word counts or weights; length can vary by document size.Step 2: Choose metric ignoring length but capturing direction
Cosine similarity measures angle, so it focuses on topic similarity ignoring document length differences.Final Answer:
Cosine similarity, because it measures angle ignoring length differences -> Option AQuick Check:
Topic similarity = cosine similarity [OK]
- Using Euclidean which is sensitive to length
- Confusing set similarity with vector similarity
- Ignoring document length effect
