0
0
Prompt Engineering / GenAIml~20 mins

Vector similarity metrics in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Vector similarity metrics
Problem:You want to measure how similar two vectors are using different similarity metrics. Currently, you use Euclidean distance but it doesn't always reflect similarity well for your data.
Current Metrics:Euclidean distance between sample vectors: 5.0 (example value)
Issue:Euclidean distance can be misleading when vectors have different lengths or directions. You want to try other similarity metrics that better capture the angle or overlap between vectors.
Your Task
Implement and compare cosine similarity and Jaccard similarity with Euclidean distance on example vectors. Show which metric better reflects similarity for given pairs.
Use only numpy and standard Python libraries.
Vectors are numeric and can be floats.
Jaccard similarity should be applied to binary vectors only.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import numpy as np

def euclidean_distance(vec1, vec2):
    return np.linalg.norm(vec1 - vec2)

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    if norm1 == 0 or norm2 == 0:
        return 0.0
    return dot_product / (norm1 * norm2)

def jaccard_similarity(vec1, vec2):
    # Convert numeric vectors to binary by thresholding at 0.5
    bin_vec1 = vec1 > 0.5
    bin_vec2 = vec2 > 0.5
    intersection = np.logical_and(bin_vec1, bin_vec2).sum()
    union = np.logical_or(bin_vec1, bin_vec2).sum()
    if union == 0:
        return 0.0
    return intersection / union

# Example vectors
vector_a = np.array([1.0, 2.0, 3.0, 4.0])
vector_b = np.array([2.0, 3.0, 4.0, 5.0])
vector_c = np.array([0.0, 0.0, 0.0, 0.0])

print(f"Euclidean distance between A and B: {euclidean_distance(vector_a, vector_b):.3f}")
print(f"Cosine similarity between A and B: {cosine_similarity(vector_a, vector_b):.3f}")
print(f"Jaccard similarity between A and B: {jaccard_similarity(vector_a, vector_b):.3f}")

print(f"Euclidean distance between A and C: {euclidean_distance(vector_a, vector_c):.3f}")
print(f"Cosine similarity between A and C: {cosine_similarity(vector_a, vector_c):.3f}")
print(f"Jaccard similarity between A and C: {jaccard_similarity(vector_a, vector_c):.3f}")
Added cosine similarity function to measure angle-based similarity.
Added Jaccard similarity function for binary vector overlap.
Provided example vectors and printed all three similarity metrics for comparison.
Results Interpretation

Before, only Euclidean distance was used, which gave a value of 5.0 (example). After adding cosine and Jaccard similarity, we see:

  • Euclidean distance between A and B: 2.0 (smaller means closer)
  • Cosine similarity between A and B: 0.995 (close to 1 means very similar direction)
  • Jaccard similarity between A and B: 1.0 (full overlap in binary thresholded vectors)

For vectors A and C (zero vector), cosine and Jaccard similarity are 0, showing no similarity, while Euclidean distance is large.

Different similarity metrics capture different aspects of vector similarity. Cosine similarity is good for direction similarity, Jaccard for binary overlap, and Euclidean for absolute distance. Choosing the right metric depends on your data and what similarity means in your context.
Bonus Experiment
Try implementing Manhattan distance and compare it with Euclidean distance on the same vectors.
💡 Hint
Manhattan distance sums absolute differences of vector components and can be more robust to outliers.