0
0
PyTorchml~5 mins

Positional encoding in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is positional encoding in the context of machine learning models like Transformers?
Positional encoding is a way to add information about the order or position of elements in a sequence, so the model knows where each element is located since models like Transformers do not process data in order by default.
Click to reveal answer
beginner
Why do Transformers need positional encoding?
Transformers process all input tokens simultaneously without any inherent order. Positional encoding helps the model understand the sequence order, which is important for tasks like language understanding.
Click to reveal answer
intermediate
How is sinusoidal positional encoding calculated?
It uses sine and cosine functions of different frequencies to create unique position vectors. For position pos and dimension i: PE(pos, 2i) = sin(pos / 10000^(2i/d_model)) and PE(pos, 2i+1) = cos(pos / 10000^(2i/d_model)). This helps the model learn relative positions.
Click to reveal answer
intermediate
What is the shape of the positional encoding tensor for a batch of sequences?
The positional encoding tensor usually has shape (sequence_length, embedding_dimension). When added to input embeddings, it matches the input shape (batch_size, sequence_length, embedding_dimension) by broadcasting over the batch.
Click to reveal answer
beginner
How does adding positional encoding affect the input embeddings in a Transformer?
Positional encoding is added element-wise to the input embeddings. This combined input carries both the token meaning and its position, allowing the Transformer to use position information during training and prediction.
Click to reveal answer
Why can't Transformers rely on sequence order without positional encoding?
ABecause they ignore input data
BBecause they only work with images
CBecause they use recurrent layers
DBecause they process all tokens simultaneously without order
Which functions are used in sinusoidal positional encoding?
ASine and cosine
BLinear and quadratic
CTangent and cotangent
DExponential and logarithm
What does the positional encoding vector depend on?
AThe batch size
BOnly the token value
CThe position in the sequence and embedding dimension
DThe model's output
How is positional encoding combined with input embeddings?
ABy adding them element-wise
BBy multiplying them
CBy concatenating them
DBy ignoring positional encoding
What is the main benefit of sinusoidal positional encoding?
AIt speeds up training
BIt allows the model to learn relative positions
CIt reduces model size
DIt removes the need for embeddings
Explain in your own words why positional encoding is important for Transformer models.
Think about how Transformers see all words at once.
You got /4 concepts.
    Describe how sinusoidal positional encoding is calculated and why sine and cosine functions are used.
    Focus on the math functions and their role in encoding position.
    You got /4 concepts.