Recall & Review
beginner
What is the main purpose of the self-attention mechanism in neural networks?
Self-attention helps the model focus on different parts of the input sequence to understand relationships and context better, improving tasks like language understanding.
Click to reveal answer
intermediate
In self-attention, what are the Query, Key, and Value vectors?
They are vectors derived from the input that help compute attention scores: Query asks what to focus on, Key represents the content to compare against, and Value holds the actual information to be combined.
Click to reveal answer
intermediate
How is the attention score calculated in self-attention?
Attention scores are calculated by taking the dot product of the Query and Key vectors, then scaling and applying a softmax to get weights that sum to 1.
Click to reveal answer
advanced
Why do we scale the dot product by the square root of the key dimension in self-attention?
Scaling prevents the dot product values from becoming too large, which helps keep the softmax function stable and gradients well-behaved during training.
Click to reveal answer
beginner
What is the output of the self-attention mechanism?
The output is a weighted sum of the Value vectors, where weights come from the attention scores, representing a context-aware combination of input elements.
Click to reveal answer
What does the Query vector represent in self-attention?
✗ Incorrect
The Query vector represents what the model is trying to focus on in the input.
Why is softmax applied to the dot product of Query and Key vectors?
✗ Incorrect
Softmax converts raw scores into probabilities that sum to 1, making them easier to interpret as attention weights.
What is the role of the Value vector in self-attention?
✗ Incorrect
Value vectors contain the information that will be weighted and combined to form the output.
What problem does scaling the dot product by sqrt(d_k) solve?
✗ Incorrect
Scaling keeps the dot product values in a range that prevents softmax from producing very small gradients.
Which of these best describes self-attention?
✗ Incorrect
Self-attention relates different parts of the input to capture context and dependencies.
Explain how the self-attention mechanism computes its output from input vectors.
Think about how the model decides what parts of the input to focus on.
You got /4 concepts.
Why is self-attention important in models like Transformers for language tasks?
Consider how words in a sentence relate to each other.
You got /4 concepts.