0

NLPml~20 mins

Attention mechanism basics in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

or

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Attention Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

What is the main purpose of the attention mechanism in neural networks?

Choose the best explanation for why attention mechanisms are used in models like transformers.

ATo allow the model to focus on different parts of the input sequence when producing each output element.

BTo reduce the size of the input data by compressing it into a fixed vector.

CTo increase the number of layers in the neural network for deeper learning.

DTo randomly drop some neurons during training to prevent overfitting.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

What is the output shape of the attention scores matrix?

Given a query matrix Q of shape (batch_size, seq_len_q, d_k) and a key matrix K of shape (batch_size, seq_len_k, d_k), what is the shape of the attention scores computed as Q × Kᵀ?

NLP

import torch
batch_size = 2
seq_len_q = 3
seq_len_k = 4
d_k = 5
Q = torch.randn(batch_size, seq_len_q, d_k)
K = torch.randn(batch_size, seq_len_k, d_k)
attention_scores = torch.bmm(Q, K.transpose(1, 2))
print(attention_scores.shape)

A(2, 3, 4)

B(2, 5, 5)

C(3, 4, 5)

D(2, 4, 3)

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Which model architecture introduced the scaled dot-product attention?

Identify the model that first used scaled dot-product attention as a core component.

AAutoencoder

BRecurrent Neural Network (RNN)

CConvolutional Neural Network (CNN)

DTransformer

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

What is the effect of the scaling factor 1/√d_k in scaled dot-product attention?

Why do we multiply the dot products by 1 divided by the square root of the key dimension (d_k) in attention calculations?

ATo increase the dot product values so the model focuses more on important tokens.

BTo reduce the sequence length by scaling down the keys.

CTo prevent the dot products from growing too large and causing softmax to have extremely small gradients.

DTo normalize the input embeddings before attention calculation.

Attempts:

2 left

❓ Metrics

expert

2:00remaining

How does attention mechanism impact model interpretability metrics?

Which statement best describes the relationship between attention weights and model interpretability?

AAttention weights reduce model accuracy but improve interpretability.

BAttention weights can offer insights but do not guarantee faithful explanations of model behavior.

CAttention weights are unrelated to interpretability and only affect training speed.

DAttention weights always provide a perfect explanation of model decisions.

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of the attention mechanism in NLP models?

easy

A. To reduce the number of layers in the model

B. To focus on important parts of the input data

C. To increase the size of the input data

D. To randomly shuffle the input tokens

Attention mechanism basics in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of attention

Step 2: Compare options with the concept

Final Answer:

Quick Check:

Solution

Step 1: Recall attention weight calculation

Step 2: Match formula to options

Final Answer:

Quick Check:

Solution

Step 1: Calculate dot products Q·K1 and Q·K2

Step 2: Apply softmax to [1, 0]

Step 3: Multiply weights by values and sum

Step 4: Match to options

Final Answer:

Quick Check:

Solution

Step 1: Check dot product dimensions

Step 2: Correct dot product usage

Final Answer:

Quick Check:

Solution

Step 1: Understand dot product scaling

Step 2: Purpose of scaling by sqrt of key dimension

Final Answer:

Quick Check: