NLPml~20 mins

Encoder-decoder with attention in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Attention Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

What is the main purpose of the attention mechanism in an encoder-decoder model?

In an encoder-decoder model for sequence-to-sequence tasks, what does the attention mechanism primarily help with?

AIt replaces the decoder with a simpler feedforward network.

BIt allows the decoder to focus on different parts of the input sequence dynamically during generation.

CIt speeds up training by skipping the encoder step.

DIt reduces the size of the input sequence by compressing it into a fixed-length vector.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output shape of attention weights in a simple encoder-decoder

Given the following PyTorch code snippet for scaled dot-product attention weights calculation, what is the shape of attention_weights?

NLP

import torch

batch_size = 2
seq_len_enc = 5
seq_len_dec = 3
hidden_dim = 4

encoder_outputs = torch.rand(batch_size, seq_len_enc, hidden_dim)
decoder_hidden = torch.rand(batch_size, seq_len_dec, hidden_dim)

# Compute attention scores
scores = torch.bmm(decoder_hidden, encoder_outputs.transpose(1, 2)) / (hidden_dim ** 0.5)

# Apply softmax to get attention weights
attention_weights = torch.softmax(scores, dim=2)

print(attention_weights.shape)

A(3, 2, 5)

B(2, 5, 3)

C(2, 3, 5)

D(2, 3, 4)

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Choosing the correct attention type for long input sequences

You want to build an encoder-decoder model for translating very long sentences. Which attention mechanism is best to handle long input sequences efficiently?

ASelf-attention only in the decoder without encoder-decoder attention.

BNo attention, just use the last encoder hidden state as context.

CGlobal attention that attends to all encoder outputs at every decoding step.

DLocal attention that attends only to a small window of encoder outputs near the current decoding position.

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of increasing attention head count in multi-head attention

In a transformer encoder-decoder model, what is the effect of increasing the number of attention heads in multi-head attention?

AIt allows the model to jointly attend to information from different representation subspaces at different positions.

BIt reduces the model's capacity by splitting the hidden dimension into smaller parts.

CIt always decreases training time because each head runs independently.

DIt removes the need for positional encoding.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying the cause of NaN loss in an encoder-decoder with attention training

During training of an encoder-decoder model with attention, the loss suddenly becomes NaN after a few epochs. Which of the following is the most likely cause?

AThe attention scores are not normalized properly before softmax, causing numerical instability.

BThe optimizer learning rate is too low, causing the loss to diverge to NaN.

CThe model uses dropout layers, which always cause NaN values during training.

DThe input sequences are too short, causing the model to overfit and produce NaN.

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of the attention mechanism in an encoder-decoder model?

easy

A. To randomly select input tokens for the decoder

B. To help the model focus on relevant parts of the input sequence when generating each output token

C. To speed up the training by skipping some input tokens

D. To reduce the size of the input data before encoding

Encoder-decoder with attention in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of attention in sequence models

Step 2: Identify the correct purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall attention weight calculation

Step 2: Match the correct formula

Final Answer:

Quick Check:

Solution

Step 1: Analyze tensor shapes in batch matrix multiplication

Step 2: Remove last dimension and apply softmax

Final Answer:

Quick Check:

Solution

Step 1: Understand uniform attention weights meaning

Step 2: Identify missing softmax effect

Final Answer:

Quick Check:

Solution

Step 1: Identify challenges with long sentences

Step 2: Understand multi-head attention benefits

Final Answer:

Quick Check: