Challenge - 5 Problems

🎖️

Positional Encoding Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

1:30remaining

Output of Positional Encoding Tensor Shape

What is the shape of the positional encoding tensor generated by the following PyTorch code snippet?

PyTorch

import torch
import math

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    return pe

pe_tensor = positional_encoding(50, 512)
print(pe_tensor.shape)

A(50, 512)

B(512, 50)

C(50, 256)

D(512, 50, 2)

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Purpose of Positional Encoding in Transformers

Why do transformer models use positional encoding?

ATo normalize the input data before feeding it to the model.

BTo add information about the order of tokens since transformers have no built-in sequence order awareness.

CTo randomly shuffle the input tokens to improve generalization.

DTo reduce the size of the input embeddings for faster computation.

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of Changing Model Dimension in Positional Encoding

If you increase the model dimension (d_model) in the positional encoding function, what is the expected effect on the positional encoding vectors?

AThe positional encoding vectors will become sparse with many zeros.

BThe positional encoding vectors will become shorter, losing positional information.

CThe positional encoding vectors will have more dimensions, allowing finer granularity in encoding positions.

DThe positional encoding vectors will remain the same size but with different values.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the Error in Positional Encoding Code

What error will the following PyTorch code raise when executed?

PyTorch

import torch
import math

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len).float().unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    return pe

pe_tensor = positional_encoding(50, 512)

ARuntimeError due to shape mismatch in assignment to pe[:, 0::2]

BSyntaxError due to missing colon in function definition

CTypeError because torch.arange returns a list, not a tensor

DNo error, code runs successfully

Attempts:

2 left

❓ Model Choice

expert

2:30remaining

Choosing Positional Encoding Type for a Transformer Model

You want to build a transformer model for a task with very long sequences (e.g., 10,000 tokens). Which positional encoding approach is best to handle this scenario?

AUse learned positional embeddings with fixed maximum length of 512 tokens.

BUse random positional embeddings initialized at each training batch.

CUse no positional encoding and rely on attention alone.

DUse sinusoidal positional encoding which can generalize to longer sequences.

Attempts:

2 left