Complete the code to define the input layer for a transformer model in Simulink.
inputLayer = sequenceInputLayer([1]);The input layer size is typically set to the embedding dimension, which is 128 in this example.
Complete the code to add a multi-head attention layer with 8 heads in Simulink.
multiHeadAttn = multiHeadAttentionLayer([1], 'Name', 'multiHeadAttn');
Transformers commonly use 8 attention heads for multi-head attention layers.
Fix the error in the code to correctly add a position-wise feedforward layer in Simulink.
feedForward = fullyConnectedLayer([1], 'Name', 'feedForward');
The feedforward layer in transformers typically has a larger size, often 2048, to increase model capacity.
Fill both blanks to correctly create a transformer encoder block with normalization and dropout layers.
encoderBlock = [layerNormalizationLayer('Name', 'norm1'), dropoutLayer([1], 'Name', 'dropout1'), layerNormalizationLayer('Name', 'norm2'), dropoutLayer([2], 'Name', 'dropout2')];
Dropout rates of 0.1 are commonly used in transformer encoder blocks to prevent overfitting.
Fill all three blanks to define the transformer model layers including input, encoder, and output layers.
layers = [sequenceInputLayer([1], 'Name', 'input'), transformerEncoderLayer([2], [3], 'Name', 'encoder'), fullyConnectedLayer(10, 'Name', 'output')];
The input size is 128, the encoder uses 8 attention heads, and the feedforward dimension is 2048, matching common transformer settings.