0
0
Simulinkdata~20 mins

Transformer modeling in Simulink - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Transformer Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding the Attention Mechanism in Transformers

Which statement best describes the role of the attention mechanism in a Transformer model?

AIt reduces the size of the input data by compressing it into a fixed-length vector.
BIt applies convolutional filters to extract local features from the input sequence.
CIt normalizes the input data to have zero mean and unit variance before processing.
DIt allows the model to focus on different parts of the input sequence when producing each output element.
Attempts:
2 left
💡 Hint

Think about how the model decides which words to pay attention to when translating a sentence.

Predict Output
intermediate
2:00remaining
Output Shape of Transformer Encoder Layer

Given an input tensor of shape (batch_size=4, sequence_length=10, embedding_dim=64) passed through a Transformer encoder layer with the same embedding dimension, what will be the shape of the output tensor?

Simulink
input_shape = (4, 10, 64)
# Transformer encoder layer with embedding_dim=64
output_shape = (4, 10, 64)
A(10, 4, 64)
B(4, 64, 10)
C(4, 10, 64)
D(4, 10)
Attempts:
2 left
💡 Hint

The Transformer encoder preserves the sequence length and embedding dimension in its output.

Hyperparameter
advanced
2:00remaining
Choosing the Number of Attention Heads

In a Transformer model, if the embedding dimension is 128, which choice of number of attention heads is valid and why?

A5, because it is the default number in most libraries.
B8, because 128 divided by 8 equals 16, which is an integer dimension per head.
C10, because more heads always improve performance regardless of embedding size.
D7, because prime numbers improve model generalization.
Attempts:
2 left
💡 Hint

Each attention head processes a slice of the embedding dimension equally.

Metrics
advanced
2:00remaining
Evaluating Transformer Model Performance

Which metric is most appropriate to evaluate a Transformer model trained for a multi-class text classification task?

AAccuracy, because it measures the proportion of correctly predicted classes.
BMean Squared Error, because it measures the average squared difference between predicted and true values.
CBLEU score, because it evaluates the quality of generated text sequences.
DPerplexity, because it measures the uncertainty of a language model.
Attempts:
2 left
💡 Hint

Think about a task where the model picks one class label from many possible classes.

🔧 Debug
expert
2:00remaining
Identifying the Cause of Training Instability in a Transformer

A Transformer model training suddenly diverges with loss becoming NaN after a few epochs. Which of the following is the most likely cause?

AThe learning rate is too high, causing gradient explosion.
BThe batch size is too small, causing underfitting.
CThe embedding dimension is too low, causing poor representation.
DThe number of attention heads is too large, causing overfitting.
Attempts:
2 left
💡 Hint

Consider what causes gradients to become unstable during training.