0
0
PyTorchml~20 mins

Positional encoding in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Positional Encoding Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
1:30remaining
Output of Positional Encoding Tensor Shape
What is the shape of the positional encoding tensor generated by the following PyTorch code snippet?
PyTorch
import torch
import math

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    return pe

pe_tensor = positional_encoding(50, 512)
print(pe_tensor.shape)
A(50, 512)
B(512, 50)
C(50, 256)
D(512, 50, 2)
Attempts:
2 left
💡 Hint
Think about how the positional encoding is created for each position and each dimension.
🧠 Conceptual
intermediate
1:30remaining
Purpose of Positional Encoding in Transformers
Why do transformer models use positional encoding?
ATo normalize the input data before feeding it to the model.
BTo add information about the order of tokens since transformers have no built-in sequence order awareness.
CTo randomly shuffle the input tokens to improve generalization.
DTo reduce the size of the input embeddings for faster computation.
Attempts:
2 left
💡 Hint
Transformers process all tokens simultaneously without sequence order. How do they know token positions?
Hyperparameter
advanced
2:00remaining
Effect of Changing Model Dimension in Positional Encoding
If you increase the model dimension (d_model) in the positional encoding function, what is the expected effect on the positional encoding vectors?
AThe positional encoding vectors will become sparse with many zeros.
BThe positional encoding vectors will become shorter, losing positional information.
CThe positional encoding vectors will have more dimensions, allowing finer granularity in encoding positions.
DThe positional encoding vectors will remain the same size but with different values.
Attempts:
2 left
💡 Hint
Consider what d_model controls in the positional encoding tensor shape.
🔧 Debug
advanced
2:00remaining
Identify the Error in Positional Encoding Code
What error will the following PyTorch code raise when executed?
PyTorch
import torch
import math

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len).float().unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    return pe

pe_tensor = positional_encoding(50, 512)
ARuntimeError due to shape mismatch in assignment to pe[:, 0::2]
BSyntaxError due to missing colon in function definition
CTypeError because torch.arange returns a list, not a tensor
DNo error, code runs successfully
Attempts:
2 left
💡 Hint
Check the data types and shapes of tensors used in multiplication and assignment.
Model Choice
expert
2:30remaining
Choosing Positional Encoding Type for a Transformer Model
You want to build a transformer model for a task with very long sequences (e.g., 10,000 tokens). Which positional encoding approach is best to handle this scenario?
AUse learned positional embeddings with fixed maximum length of 512 tokens.
BUse random positional embeddings initialized at each training batch.
CUse no positional encoding and rely on attention alone.
DUse sinusoidal positional encoding which can generalize to longer sequences.
Attempts:
2 left
💡 Hint
Consider which encoding can extrapolate beyond training sequence lengths.