Recall & Review

beginner

What is the main purpose of the attention mechanism in neural networks?

The attention mechanism helps the model focus on the most important parts of the input data when making predictions, similar to how humans pay attention to relevant information.

Click to reveal answer

intermediate

Explain the difference between 'soft' and 'hard' attention.

Soft attention assigns weights to all input parts and computes a weighted sum, allowing smooth focus. Hard attention selects one part of the input, making it discrete and non-differentiable, often requiring special training methods.

Click to reveal answer

beginner

What are the three main components of the scaled dot-product attention?

The three components are Query (Q), Key (K), and Value (V). The attention score is computed by comparing Q with K, then used to weight V for the output.

Click to reveal answer

intermediate

Why do we scale the dot product by the square root of the key dimension in scaled dot-product attention?

Scaling by the square root of the key dimension prevents the dot product values from becoming too large, which can cause very small gradients and slow learning.

Click to reveal answer

intermediate

How does multi-head attention improve the model's ability to focus on different parts of the input?

Multi-head attention runs several attention mechanisms in parallel, each focusing on different parts or aspects of the input, allowing the model to capture diverse information.

Click to reveal answer

What does the 'Query' represent in the attention mechanism?

AThe information used to compare with keys

BThe part of the input we want to focus on

CThe output of the attention layer

DThe weights assigned to input tokens

Why is softmax used in attention mechanisms?

ATo select the maximum value only

BTo increase the size of the input

CTo reduce the number of parameters

DTo normalize attention scores into probabilities

Which of these is NOT a benefit of multi-head attention?

ACaptures information from different representation subspaces

BAllows the model to attend to multiple positions simultaneously

CReduces the total number of parameters drastically

DImproves the model's ability to understand complex relationships

What problem does the attention mechanism help solve in sequence models?

AVanishing gradients in deep networks

BDifficulty in remembering long-range dependencies

COverfitting on small datasets

DReducing training time by skipping layers

In scaled dot-product attention, what happens after computing the dot product between Query and Key?

AThe result is scaled and passed through softmax to get weights

BThe result is multiplied by the Value directly

CThe result is ignored and only Value is used

DThe result is passed through a ReLU activation

Describe how the attention mechanism works step-by-step in a neural network.

Explain why multi-head attention is more powerful than single-head attention.

Practice

(1/5)

1. What is the main purpose of the attention mechanism in NLP models?

easy

A. To increase the size of the input data

B. To reduce the number of layers in the model

C. To help the model focus on important parts of the input data

D. To randomly shuffle the input tokens

Attention mechanism in depth in NLP - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand attention's role

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Recall attention weight calculation

Step 2: Evaluate options

Final Answer:

Quick Check:

Solution

Step 1: Calculate dot products Q x K^T

Step 2: Apply softmax to scores

Step 3: Compute weighted sum of values

Step 4: Match option

Final Answer:

Quick Check:

Solution

Step 1: Check dot product operation

Step 2: Analyze code

Final Answer:

Quick Check:

Solution

Step 1: Understand dot product scaling

Step 2: Role of scaling by sqrt of key dimension

Final Answer:

Quick Check: