Challenge - 5 Problems

🎖️

Transformer Decoder Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output shape of Transformer decoder layer

Given the following PyTorch code snippet for a Transformer decoder layer, what is the shape of the output tensor?

PyTorch

import torch
import torch.nn as nn

decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
memory = torch.rand(10, 32, 512)  # (sequence_length, batch_size, d_model)
tgt = torch.rand(20, 32, 512)     # (target_sequence_length, batch_size, d_model)
output = decoder_layer(tgt, memory)
print(output.shape)

Atorch.Size([20, 32, 512])

Btorch.Size([10, 32, 512])

Ctorch.Size([32, 20, 512])

Dtorch.Size([20, 512, 32])

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Purpose of the memory input in Transformer decoder

In a Transformer decoder, what is the role of the 'memory' input?

AIt stores the previous decoder outputs for autoregressive generation.

BIt provides the encoded information from the encoder to guide decoding.

CIt contains the positional encodings for the target sequence.

DIt is used to compute the loss during training.

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Effect of increasing nhead in Transformer decoder

What is the effect of increasing the 'nhead' parameter in a Transformer decoder layer?

AIt increases the batch size during training.

BIt increases the size of the feedforward network inside the decoder layer.

CIt increases the number of decoder layers stacked.

DIt increases the number of attention heads, allowing the model to focus on more representation subspaces.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying error in Transformer decoder usage

What error will this code raise when running the Transformer decoder layer?

PyTorch

import torch
import torch.nn as nn

decoder_layer = nn.TransformerDecoderLayer(d_model=256, nhead=4)
memory = torch.rand(15, 16, 256)
tgt = torch.rand(15, 16, 128)  # Incorrect d_model size
output = decoder_layer(tgt, memory)

ANo error, code runs successfully

BSyntaxError because of missing parentheses

CRuntimeError due to mismatched feature dimensions between tgt and memory

DTypeError because tgt is not a tensor

Attempts:

2 left

❓ Metrics

expert

2:30remaining

Choosing the best metric for Transformer decoder language generation

Which metric is most appropriate to evaluate the quality of text generated by a Transformer decoder in a language translation task?

ABLEU score measuring n-gram overlap with reference translations

BMean Squared Error between generated and reference token embeddings

CAccuracy of predicting the next token in the sequence

DF1 score computed on binary classification labels

Attempts:

2 left