beginner

What is the main purpose of a Transformer model in machine learning?

A Transformer model is designed to process sequences of data, like sentences, by paying attention to all parts of the sequence at once. This helps it understand context better than older models.

Click to reveal answer

beginner

What does 'self-attention' mean in Transformer models?

Self-attention lets the model look at every word in a sentence and decide which words are important to understand each word better. It helps the model focus on relevant parts of the input.

Click to reveal answer

intermediate

In Simulink, how can you represent the multi-head attention mechanism of a Transformer?

You can use parallel blocks to represent multiple attention heads, each processing the input differently, then combine their outputs to capture diverse information.

Click to reveal answer

beginner

Why do Transformer models use positional encoding?

Because Transformers look at all words at once, they need a way to know the order of words. Positional encoding adds information about word positions so the model understands sequence order.

Click to reveal answer

intermediate

What is the role of the feed-forward network in a Transformer block?

After attention, the feed-forward network processes each position separately to add more complex transformations and help the model learn better features.

Click to reveal answer

What does the 'attention' mechanism in Transformers help the model do?

AReduce the size of the input data

BFocus on important parts of the input sequence

CIncrease the number of layers

DConvert text to numbers

Why is positional encoding necessary in Transformer models?

ATo normalize input data

BTo speed up training

CTo reduce model size

DTo add information about the order of words

In Simulink, how can multi-head attention be modeled?

AUsing parallel blocks for each attention head

BUsing a single block for all heads

CUsing a feedback loop

DUsing a lookup table

What is the output of the feed-forward network in a Transformer block?

AThe attention scores

BThe original input sequence

CA transformed version of each position's data

DThe positional encoding

Which of these is NOT a component of a Transformer block?

AConvolutional layer

BSelf-attention

CFeed-forward network

DPositional encoding

Explain how self-attention works in a Transformer model and why it is important.

Describe how you would model a Transformer block in Simulink, including key components.