NLPml~20 mins

Transformer architecture in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Transformer Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

What is the main purpose of the self-attention mechanism in a Transformer?

In the Transformer model, the self-attention mechanism helps the model to:

AFocus on different parts of the input sequence to understand relationships between words.

BReduce the size of the input data by compressing it into a smaller vector.

CGenerate random noise to improve model robustness during training.

DSort the input words in alphabetical order before processing.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output shape after multi-head attention layer

Given the following code snippet using PyTorch, what is the shape of the output tensor?

NLP

import torch
import torch.nn as nn

batch_size = 2
seq_len = 5
embed_dim = 16
num_heads = 4

x = torch.rand(batch_size, seq_len, embed_dim)
mha = nn.MultiheadAttention(embed_dim=embed_dim, num_heads=num_heads)

# PyTorch MultiheadAttention expects input shape (seq_len, batch_size, embed_dim)
x_t = x.transpose(0, 1)
out, _ = mha(x_t, x_t, x_t)

output_shape = out.shape

A(2, 16, 5)

B(5, 2, 16)

C(2, 5, 16)

D(5, 16, 2)

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing the number of attention heads

Which of the following is a valid reason to increase the number of attention heads in a Transformer model?

ATo reduce the total number of parameters in the model for faster training.

BTo convert the model from a Transformer to a convolutional neural network.

CTo ensure the model only focuses on the first few words of the input sequence.

DTo allow the model to attend to information from multiple representation subspaces at different positions.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Interpreting training loss in Transformer models

During training a Transformer for language modeling, the loss decreases steadily but the validation loss starts increasing after some epochs. What does this indicate?

AThe model is overfitting the training data and not generalizing well to new data.

BThe model is underfitting and needs more training epochs.

CThe training data is corrupted and causing unstable loss values.

DThe optimizer is not updating the model weights correctly.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Identifying error in Transformer positional encoding implementation

Consider this simplified code snippet for positional encoding in a Transformer. What error will this code raise when run?

NLP

import torch
import math

def positional_encoding(seq_len, d_model):
    pe = torch.zeros(seq_len, d_model)
    position = torch.arange(0, seq_len).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    return pe

pos_enc = positional_encoding(10, 7)

ASyntaxError because of missing colon in function definition

BTypeError because torch.arange returns a list, not a tensor

CRuntimeError due to shape mismatch when assigning to pe[:, 1::2]

DNo error, code runs correctly and returns positional encoding tensor

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of the self-attention mechanism in a Transformer model?

easy

A. To increase the number of layers in the model

B. To reduce the size of the input data

C. To convert words into numbers

D. To let the model focus on different words in the sentence at the same time

Transformer architecture in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand self-attention role

Step 2: Match purpose with options

Final Answer:

Quick Check:

Solution

Step 1: Recall Transformer structure

Step 2: Compare options with structure

Final Answer:

Quick Check:

Solution

Step 1: Understand input shape and MultiheadAttention

Step 2: Output shape matches input shape

Final Answer:

Quick Check:

Solution

Step 1: Check shapes of tgt and memory

Step 2: Identify batch size mismatch

Step 3: Re-examine options carefully

Final Answer:

Quick Check:

Solution

Step 1: Understand summarization task

Step 2: Match task with Transformer parts

Final Answer:

Quick Check: