What is Self Attention in NLP: Simple Explanation and Example
self-attention is a technique that helps a model focus on different parts of a sentence when understanding each word. It allows the model to weigh the importance of all words relative to each other, improving context understanding.How It Works
Imagine reading a sentence and trying to understand the meaning of a word by looking at the other words around it. Self-attention works similarly: it lets the model look at every word in the sentence and decide how much each word should influence the understanding of the current word.
It does this by creating scores that measure the connection between words. For example, in the sentence "The cat sat on the mat," when focusing on "sat," the model might pay more attention to "cat" and "mat" because they relate closely to the action. This helps the model understand context better than just reading words one by one.
By doing this for every word, self-attention builds a rich understanding of the sentence, which is very useful for tasks like translation, summarization, or answering questions.
Example
This example shows a simple self-attention calculation using Python and NumPy. It computes attention scores for a small set of word vectors and outputs the weighted sum representing the attended information.
import numpy as np def softmax(x): e_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) return e_x / e_x.sum(axis=-1, keepdims=True) # Example word vectors (3 words, 4 features each) words = np.array([ [1, 0, 1, 0], # word 1 [0, 2, 0, 2], # word 2 [1, 1, 1, 1] # word 3 ]) # Compute attention scores (dot product of words with each other) scores = words @ words.T # Apply softmax to get attention weights attention_weights = softmax(scores) # Compute weighted sum of words for each word self_attention_output = attention_weights @ words print("Attention Weights:\n", attention_weights) print("\nSelf-Attention Output:\n", self_attention_output)
When to Use
Use self-attention when you want a model to understand the relationships between words in a sentence or document, especially when context matters a lot. It is key in modern NLP tasks like machine translation, text summarization, and question answering.
For example, in translating a sentence, self-attention helps the model know which words to focus on to produce accurate translations. In chatbots, it helps understand user queries better by considering the whole sentence context.
Key Points
- Self-attention lets models weigh the importance of all words relative to each other.
- It improves understanding of context in sentences.
- It is a core part of Transformer models used in NLP today.
- Self-attention computes scores and uses them to create weighted word representations.
