Bird
Raised Fist0
NLPml~5 mins

Semantic similarity with embeddings in NLP - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is semantic similarity in the context of embeddings?
Semantic similarity measures how close the meanings of two pieces of text are, using embeddings that represent their meanings as numbers.
Click to reveal answer
beginner
How do embeddings help in measuring semantic similarity?
Embeddings convert words or sentences into vectors of numbers, so we can compare these vectors mathematically to find how similar their meanings are.
Click to reveal answer
intermediate
Which common metric is used to calculate similarity between two embedding vectors?
Cosine similarity is commonly used; it measures the angle between two vectors to see how close their directions are, indicating similarity.
Click to reveal answer
beginner
What is a real-life example of semantic similarity using embeddings?
Finding similar questions in a FAQ by comparing their embeddings to a user's question, so the system can suggest the closest matching answers.
Click to reveal answer
intermediate
Why might two sentences with different words have high semantic similarity?
Because embeddings capture meaning beyond exact words, sentences with different words but similar meanings can have vectors close together.
Click to reveal answer
What does an embedding represent in NLP?
AA numerical vector representing the meaning of text
BA list of keywords from the text
CThe length of the text in characters
DThe frequency of each word in the text
Which similarity metric is most commonly used with embeddings?
AHamming distance
BEuclidean distance
CJaccard index
DCosine similarity
If two sentences have very different words but similar meanings, their embeddings will likely be:
AVery different vectors
BZero vectors
CClose vectors
DRandom vectors
Semantic similarity helps machines to:
AUnderstand how close meanings are between texts
BTranslate text into another language
CCount words in a sentence
DDetect spelling errors
Which of these is NOT a use case of semantic similarity with embeddings?
ARecommending similar articles
BSorting numbers in ascending order
CFinding duplicate questions
DMatching customer queries to answers
Explain how embeddings are used to measure semantic similarity between two sentences.
Think about how numbers can represent meaning and how we compare those numbers.
You got /4 concepts.
    Describe a simple real-world example where semantic similarity with embeddings can improve user experience.
    Consider how a FAQ or search engine might use this.
    You got /3 concepts.

      Practice

      (1/5)
      1. What does semantic similarity with embeddings help us do in natural language processing?
      easy
      A. Translate text from one language to another
      B. Count the number of words in a sentence
      C. Measure how similar the meanings of two texts are
      D. Generate random sentences

      Solution

      1. Step 1: Understand semantic similarity

        Semantic similarity means checking how close the meanings of two texts are, not just the words.
      2. Step 2: Role of embeddings

        Embeddings convert text into numbers that capture meaning, allowing comparison of texts by meaning.
      3. Final Answer:

        Measure how similar the meanings of two texts are -> Option C
      4. Quick Check:

        Semantic similarity = meaning comparison [OK]
      Hint: Semantic similarity compares meanings, not word counts [OK]
      Common Mistakes:
      • Confusing similarity with word count
      • Thinking embeddings translate text
      • Assuming semantic similarity generates text
      2. Which Python library is commonly used to compute cosine similarity between embeddings?
      easy
      A. matplotlib
      B. scikit-learn
      C. pandas
      D. flask

      Solution

      1. Step 1: Identify cosine similarity function

        Cosine similarity is often computed using scikit-learn's metrics module.
      2. Step 2: Check other libraries

        matplotlib is for plotting, pandas for data frames, flask for web apps, so they don't compute cosine similarity.
      3. Final Answer:

        scikit-learn -> Option B
      4. Quick Check:

        Cosine similarity = scikit-learn [OK]
      Hint: Use scikit-learn for cosine similarity calculations [OK]
      Common Mistakes:
      • Using matplotlib for similarity
      • Confusing pandas with similarity tools
      • Thinking flask handles embeddings
      3. What is the output of this Python code snippet?
      from sklearn.metrics.pairwise import cosine_similarity
      import numpy as np
      
      emb1 = np.array([[1, 0, 0]])
      emb2 = np.array([[0, 1, 0]])
      sim = cosine_similarity(emb1, emb2)
      print(sim[0][0])
      medium
      A. Error
      B. 1.0
      C. -1.0
      D. 0.0

      Solution

      1. Step 1: Understand cosine similarity formula

        Cosine similarity measures the cosine of the angle between two vectors. Orthogonal vectors have similarity 0.
      2. Step 2: Analyze given vectors

        emb1 is [1,0,0], emb2 is [0,1,0]. They are perpendicular, so similarity is 0.
      3. Final Answer:

        0.0 -> Option D
      4. Quick Check:

        Orthogonal vectors similarity = 0.0 [OK]
      Hint: Orthogonal vectors have cosine similarity zero [OK]
      Common Mistakes:
      • Assuming similarity is 1 for any vectors
      • Confusing dot product with cosine similarity
      • Expecting error due to shape
      4. Identify the error in this code that tries to compute semantic similarity:
      from sklearn.metrics.pairwise import cosine_similarity
      
      emb1 = [0.1, 0.2, 0.3]
      emb2 = [0.1, 0.2, 0.3]
      sim = cosine_similarity(emb1, emb2)
      print(sim)
      medium
      A. emb1 and emb2 should be 2D arrays, not 1D lists
      B. cosine_similarity function does not exist in sklearn
      C. embeddings must be strings, not numbers
      D. print statement syntax is incorrect

      Solution

      1. Step 1: Check input format for cosine_similarity

        cosine_similarity expects 2D arrays (like [[...]]), but emb1 and emb2 are 1D lists.
      2. Step 2: Confirm other options

        cosine_similarity exists, embeddings are numeric vectors, and print syntax is correct in Python 3.
      3. Final Answer:

        emb1 and emb2 should be 2D arrays, not 1D lists -> Option A
      4. Quick Check:

        Input shape must be 2D arrays [OK]
      Hint: cosine_similarity needs 2D arrays, not 1D lists [OK]
      Common Mistakes:
      • Passing 1D lists instead of 2D arrays
      • Thinking embeddings must be text
      • Misunderstanding print syntax
      5. You have two sentences: "I love apples" and "I adore oranges". Using a pre-trained embedding model, you get vectors for both. Which approach best helps you find if these sentences have similar meaning?
      hard
      A. Calculate cosine similarity between their embeddings
      B. Count common words between the sentences
      C. Check if sentence lengths are equal
      D. Compare the first letters of each word

      Solution

      1. Step 1: Understand semantic similarity goal

        We want to compare meanings, not just words or sentence length.
      2. Step 2: Use embeddings and cosine similarity

        Pre-trained embeddings capture meaning; cosine similarity measures closeness of meanings numerically.
      3. Final Answer:

        Calculate cosine similarity between their embeddings -> Option A
      4. Quick Check:

        Meaning comparison = cosine similarity on embeddings [OK]
      Hint: Use cosine similarity on embeddings for meaning comparison [OK]
      Common Mistakes:
      • Relying on word overlap only
      • Using sentence length as similarity
      • Comparing letters instead of meaning