Bird
Raised Fist0
NLPml~20 mins

Pre-trained embedding usage in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Pre-trained Embedding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of embedding vector shape
What is the shape of the output embedding vector when using a pre-trained embedding layer with input size 10 and embedding dimension 50 for a batch of 3 samples?
NLP
import torch
import torch.nn as nn

embedding = nn.Embedding(num_embeddings=10, embedding_dim=50)
input_indices = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
output = embedding(input_indices)
print(output.shape)
Atorch.Size([3, 3])
Btorch.Size([10, 50])
Ctorch.Size([3, 50])
Dtorch.Size([3, 3, 50])
Attempts:
2 left
💡 Hint
Think about the input shape and how embedding layers map indices to vectors.
Model Choice
intermediate
2:00remaining
Choosing pre-trained embeddings for sentiment analysis
You want to build a sentiment analysis model on movie reviews. Which pre-trained embedding is best suited to capture semantic meaning of words in this context?
APre-trained GloVe embeddings trained on Common Crawl
BPre-trained embeddings from a speech recognition model
COne-hot encoded vectors
DRandomly initialized embeddings trained from scratch
Attempts:
2 left
💡 Hint
Consider embeddings trained on large text corpora relevant to general language.
Hyperparameter
advanced
2:00remaining
Effect of freezing pre-trained embeddings
What is the effect of freezing the weights of a pre-trained embedding layer during training?
AThe embedding weights are randomly re-initialized at each epoch
BThe embedding weights are updated during training to adapt to the new task
CThe embedding weights remain fixed and are not updated during training
DThe embedding weights are discarded and replaced with one-hot vectors
Attempts:
2 left
💡 Hint
Freezing means preventing changes to the weights.
Metrics
advanced
2:00remaining
Evaluating embedding quality with downstream task accuracy
You compare two pre-trained embeddings by training the same classifier on a text classification task. Embedding A yields 85% accuracy, embedding B yields 78%. What can you conclude?
AEmbedding B is better because lower accuracy means less overfitting
BEmbedding A is better for this task because it leads to higher accuracy
CBoth embeddings are equally good because accuracy differences are insignificant
DEmbedding B is better because it has fewer parameters
Attempts:
2 left
💡 Hint
Higher accuracy usually means better feature representation for the task.
🔧 Debug
expert
3:00remaining
Identifying error when loading pre-trained embeddings
You try to load pre-trained embeddings into your model but get a size mismatch error. What is the most likely cause?
NLP
import torch
import torch.nn as nn

embedding = nn.Embedding(num_embeddings=1000, embedding_dim=300)
pretrained_weights = torch.randn(500, 300)
embedding.weight.data.copy_(pretrained_weights)
AThe pretrained weights have fewer embeddings (500) than the model expects (1000), causing size mismatch
BThe pretrained weights tensor is not a float tensor
CThe embedding dimension 300 does not match pretrained weights dimension 300
DThe model's embedding layer is not initialized before copying weights
Attempts:
2 left
💡 Hint
Check the shape of pretrained weights vs model embedding weights.

Practice

(1/5)
1. What is the main benefit of using pre-trained embeddings in NLP tasks?
easy
A. They only work for images, not text.
B. They generate random word vectors for each run.
C. They replace the need for any model training.
D. They provide ready-made word meanings, saving training time.

Solution

  1. Step 1: Understand what pre-trained embeddings are

    Pre-trained embeddings are word vectors learned from large text data before your task.
  2. Step 2: Identify their benefit

    They save time because you don't train word meanings from scratch, improving efficiency.
  3. Final Answer:

    They provide ready-made word meanings, saving training time. -> Option D
  4. Quick Check:

    Pre-trained embeddings = ready-made word meanings [OK]
Hint: Pre-trained means already learned word meanings [OK]
Common Mistakes:
  • Thinking embeddings generate random vectors each time
  • Believing embeddings remove all model training
  • Confusing embeddings with image features
2. Which Python code correctly loads a pre-trained embedding file named glove.txt into a dictionary called embeddings?
easy
A. embeddings = open('glove.txt').split()
B. embeddings = open('glove.txt').read()
C. embeddings = {line.split()[0]: list(map(float, line.split()[1:])) for line in open('glove.txt')}
D. embeddings = dict(open('glove.txt'))

Solution

  1. Step 1: Understand the file format

    Each line has a word followed by numbers (vector components).
  2. Step 2: Choose code that maps words to vectors

    embeddings = {line.split()[0]: list(map(float, line.split()[1:])) for line in open('glove.txt')} splits each line, uses first part as key, rest as float list values.
  3. Final Answer:

    embeddings = {line.split()[0]: list(map(float, line.split()[1:])) for line in open('glove.txt')} -> Option C
  4. Quick Check:

    Dictionary comprehension with split and float conversion = embeddings = {line.split()[0]: list(map(float, line.split()[1:])) for line in open('glove.txt')} [OK]
Hint: Use dict comprehension with split and float conversion [OK]
Common Mistakes:
  • Using read() returns a string, not a dict
  • Trying to split on file object directly
  • Passing file object to dict() without processing
3. Given the code below, what will print(embeddings['cat']) output if glove.txt contains the line cat 0.1 0.2 0.3?
embeddings = {line.split()[0]: list(map(float, line.split()[1:])) for line in open('glove.txt')}
print(embeddings['cat'])
medium
A. [0.1, 0.2, 0.3]
B. 'cat 0.1 0.2 0.3'
C. ['cat', 0.1, 0.2, 0.3]
D. KeyError

Solution

  1. Step 1: Understand dictionary comprehension

    Each word maps to a list of floats from the line after splitting.
  2. Step 2: Check the key 'cat'

    It maps to [0.1, 0.2, 0.3] as floats in a list.
  3. Final Answer:

    [0.1, 0.2, 0.3] -> Option A
  4. Quick Check:

    embeddings['cat'] = float list [OK]
Hint: Split line, first word key, rest floats list [OK]
Common Mistakes:
  • Expecting string instead of float list
  • Confusing key with value
  • Assuming KeyError without checking file content
4. The code below tries to load embeddings but causes type issues. What is the likely cause?
embeddings = {}
with open('glove.txt') as f:
    for line in f:
        word, vector = line.split()[0], line.split()[1:]
        embeddings[word] = vector
print(type(embeddings['dog'][0]))
medium
A. The file path 'glove.txt' is incorrect.
B. The vector values are strings, not floats, causing type issues.
C. The dictionary keys are not unique.
D. The print statement syntax is wrong.

Solution

  1. Step 1: Analyze vector assignment

    Vector is assigned as list of strings from split, not converted to floats.
  2. Step 2: Check print type

    Printing type of embeddings['dog'][0] shows string, not float, which may cause errors later.
  3. Final Answer:

    The vector values are strings, not floats, causing type issues. -> Option B
  4. Quick Check:

    Missing float conversion = The vector values are strings, not floats, causing type issues. [OK]
Hint: Convert vector strings to floats before storing [OK]
Common Mistakes:
  • Ignoring need to convert strings to floats
  • Assuming file path error without checking
  • Thinking keys must be unique error
5. You want to use pre-trained embeddings in a text classification model. Which step is essential to correctly use these embeddings in your model's input layer?
hard
A. Map each word in your text to its embedding vector and create a matrix input.
B. Train embeddings from scratch ignoring pre-trained vectors.
C. Replace all words with their index positions only.
D. Use embeddings only for output layer predictions.

Solution

  1. Step 1: Understand embedding usage in models

    Pre-trained embeddings provide vector representations for words to input into models.
  2. Step 2: Identify correct input preparation

    Mapping words to their vectors and forming a matrix is needed to feed the model.
  3. Final Answer:

    Map each word in your text to its embedding vector and create a matrix input. -> Option A
  4. Quick Check:

    Embedding vectors as input = Map each word in your text to its embedding vector and create a matrix input. [OK]
Hint: Convert words to vectors matrix before model input [OK]
Common Mistakes:
  • Ignoring pre-trained vectors and training from scratch
  • Using word indices without embeddings
  • Applying embeddings only at output layer