Bird
Raised Fist0
Prompt Engineering / GenAIml~6 mins

Vector similarity metrics in Prompt Engineering / GenAI - Full Explanation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Imagine you have two lists of numbers representing things like images, words, or sounds, and you want to know how alike they are. Vector similarity metrics help us measure how close or similar these lists, called vectors, are to each other.
Explanation
Cosine Similarity
Cosine similarity measures the angle between two vectors, showing how much they point in the same direction regardless of their length. It gives a value between -1 and 1, where 1 means exactly the same direction and 0 means orthogonal (no similarity).
Cosine similarity tells us how aligned two vectors are by measuring the angle between them.
Euclidean Distance
Euclidean distance calculates the straight-line distance between two points in space. The smaller the distance, the more similar the vectors are. It works like measuring the shortest path between two points on a map.
Euclidean distance measures how far apart two vectors are in space.
Manhattan Distance
Manhattan distance sums the absolute differences of each dimension between two vectors. It is like walking along a grid of city streets, moving only up/down and left/right, rather than straight across.
Manhattan distance measures similarity by adding up the absolute differences across all dimensions.
Jaccard Similarity
Jaccard similarity compares two sets by dividing the size of their overlap by the size of their combined elements. When vectors represent sets, this metric shows how much they share in common.
Jaccard similarity measures how much two sets overlap compared to their total size.
Real World Analogy

Imagine comparing two playlists of songs to see how similar they are. Cosine similarity is like checking if both playlists have songs that fit the same mood. Euclidean distance is like measuring how different the playlists are by counting how many songs differ. Manhattan distance is like walking through a city comparing each street (song) one by one. Jaccard similarity is like seeing how many songs both playlists share compared to all songs combined.

Cosine Similarity → Checking if two playlists have songs that fit the same mood regardless of playlist length
Euclidean Distance → Measuring how many songs differ between two playlists by counting straight differences
Manhattan Distance → Walking through a city grid comparing each street (song) one by one between playlists
Jaccard Similarity → Seeing how many songs both playlists share compared to all songs combined
Diagram
Diagram
┌───────────────────────────────┐
│         Vector Space           │
│                               │
│   ● A                        ● B│
│    \                        / │
│     \  Angle (Cosine)       /  │
│      \                    /   │
│       ●------------------●    │
│       Euclidean Distance       │
│                               │
│  Manhattan Distance: sum of   │
│  grid steps between A and B   │
│                               │
│  Jaccard Similarity: overlap  │
│  of sets represented by A & B │
└───────────────────────────────┘
This diagram shows two vectors A and B in space, illustrating cosine similarity as the angle between them, Euclidean distance as the straight line, Manhattan distance as grid steps, and Jaccard similarity as overlap of sets.
Key Facts
Cosine SimilarityMeasures the angle between two vectors to find how similar their directions are.
Euclidean DistanceCalculates the straight-line distance between two points in vector space.
Manhattan DistanceSums the absolute differences of vector components, like walking city blocks.
Jaccard SimilarityMeasures overlap between two sets divided by their combined size.
Common Confusions
Believing cosine similarity measures distance between vectors.
Believing cosine similarity measures distance between vectors. Cosine similarity measures the angle between vectors, not the distance; vectors can be far apart but still have a small angle.
Thinking Euclidean and Manhattan distances always give the same similarity result.
Thinking Euclidean and Manhattan distances always give the same similarity result. Euclidean distance measures straight-line distance, while Manhattan distance sums absolute differences along axes; they can give different similarity rankings.
Assuming Jaccard similarity works on any numeric vectors.
Assuming Jaccard similarity works on any numeric vectors. Jaccard similarity applies to sets or binary vectors, not to general numeric vectors.
Summary
Vector similarity metrics help compare how alike two lists of numbers are by measuring angles, distances, or overlaps.
Cosine similarity focuses on direction, Euclidean and Manhattan distances focus on how far apart vectors are, and Jaccard similarity measures shared elements in sets.
Choosing the right metric depends on the type of data and what kind of similarity matters most.

Practice

(1/5)
1. Which vector similarity metric measures the angle between two vectors to determine how similar they are?
easy
A. Manhattan distance
B. Euclidean distance
C. Cosine similarity
D. Jaccard similarity

Solution

  1. Step 1: Understand cosine similarity

    Cosine similarity measures the cosine of the angle between two vectors, showing how aligned they are.
  2. Step 2: Compare with other metrics

    Euclidean and Manhattan distances measure gaps, not angles. Jaccard is for sets, not vectors.
  3. Final Answer:

    Cosine similarity -> Option C
  4. Quick Check:

    Angle-based similarity = Cosine similarity [OK]
Hint: Angle means cosine similarity, distance means Euclidean [OK]
Common Mistakes:
  • Confusing distance with angle measurement
  • Thinking Euclidean measures angle
  • Mixing set similarity with vector similarity
2. Which of the following is the correct Python expression to compute cosine similarity between two vectors a and b using numpy?
easy
A. np.linalg.norm(a - b)
B. np.dot(a, b) * (np.linalg.norm(a) + np.linalg.norm(b))
C. np.sum(np.abs(a - b))
D. np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Solution

  1. Step 1: Recall cosine similarity formula

    Cosine similarity = dot product of vectors divided by product of their lengths (norms).
  2. Step 2: Match formula to code

    np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) matches this formula exactly. np.linalg.norm(a - b) is Euclidean distance, C is Manhattan distance, D is incorrect formula.
  3. Final Answer:

    np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) -> Option D
  4. Quick Check:

    Dot product over norms = cosine similarity [OK]
Hint: Cosine = dot product divided by norms product [OK]
Common Mistakes:
  • Using subtraction instead of dot product
  • Multiplying norms instead of dividing
  • Confusing Euclidean with cosine formula
3. Given vectors a = np.array([1, 2, 3]) and b = np.array([4, 5, 6]), what is the output of np.linalg.norm(a - b)?
medium
A. 3.742
B. 5.196
C. 15.0
D. 32.0

Solution

  1. Step 1: Calculate vector difference

    a - b = [1-4, 2-5, 3-6] = [-3, -3, -3]
  2. Step 2: Compute Euclidean norm

    Norm = sqrt((-3)^2 + (-3)^2 + (-3)^2) = sqrt(9+9+9) = sqrt(27) ≈ 5.196
  3. Final Answer:

    5.196 -> Option B
  4. Quick Check:

    Euclidean distance = 5.196 [OK]
Hint: Euclidean norm = sqrt(sum of squared differences) [OK]
Common Mistakes:
  • Forgetting to square differences
  • Calculating sum instead of sqrt of sum
  • Mixing up vector subtraction order
4. Identify the error in this Python code snippet for cosine similarity:
import numpy as np

def cosine_sim(a, b):
    return np.dot(a, b) / np.linalg.norm(a) + np.linalg.norm(b)

print(cosine_sim(np.array([1, 0]), np.array([0, 1])))
medium
A. The denominator should multiply norms, not add them
B. np.dot is used incorrectly; should be np.cross
C. Vectors must be normalized before dot product
D. Function is missing return statement

Solution

  1. Step 1: Analyze denominator in formula

    The code adds norms: np.linalg.norm(a) + np.linalg.norm(b), but cosine similarity divides by their product.
  2. Step 2: Understand correct formula

    Cosine similarity = dot(a,b) / (norm(a) * norm(b)), so addition is wrong here.
  3. Final Answer:

    The denominator should multiply norms, not add them -> Option A
  4. Quick Check:

    Denominator = product of norms [OK]
Hint: Denominator in cosine similarity multiplies norms [OK]
Common Mistakes:
  • Adding norms instead of multiplying
  • Using cross product instead of dot product
  • Forgetting to return value
5. You have two text documents represented as vectors: doc1 = [1, 0, 2, 1] and doc2 = [0, 1, 1, 1]. Which similarity metric is best to find how similar their topics are, and why?
hard
A. Cosine similarity, because it measures angle ignoring length differences
B. Euclidean distance, because it measures exact gap between vectors
C. Manhattan distance, because it sums absolute differences
D. Jaccard similarity, because it compares set overlap

Solution

  1. Step 1: Understand vector meaning in text

    Vectors represent word counts or weights; length can vary by document size.
  2. Step 2: Choose metric ignoring length but capturing direction

    Cosine similarity measures angle, so it focuses on topic similarity ignoring document length differences.
  3. Final Answer:

    Cosine similarity, because it measures angle ignoring length differences -> Option A
  4. Quick Check:

    Topic similarity = cosine similarity [OK]
Hint: For text, angle-based similarity works best [OK]
Common Mistakes:
  • Using Euclidean which is sensitive to length
  • Confusing set similarity with vector similarity
  • Ignoring document length effect