0
0
NLPml~20 mins

Why different transformers serve different tasks in NLP - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Transformer Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do transformers have different architectures for different tasks?

Transformers are used in many tasks like translation, text classification, and question answering. Why do we need different transformer models for these tasks?

ABecause transformers only work for translation tasks and cannot be used for other tasks without changing the language.
BBecause transformers use different programming languages for different tasks.
CBecause transformers are slow and need to be replaced by different models for each task.
DBecause each task requires a different way to process input and output, so transformer models are designed with specific layers or heads to handle those needs.
Attempts:
2 left
💡 Hint

Think about how the output of a model changes depending on the task.

Model Choice
intermediate
2:00remaining
Choosing the right transformer for sentiment analysis

You want to build a model to classify movie reviews as positive or negative. Which transformer model is best suited for this task?

AGPT-3 used as a language generator without fine-tuning
BA convolutional neural network without transformers
CBERT with a classification head on top
DTransformer model designed only for machine translation
Attempts:
2 left
💡 Hint

Think about which model is designed to understand sentence meaning and output labels.

Predict Output
advanced
2:00remaining
Output shape of transformer model for question answering

Consider a transformer model fine-tuned for question answering. The input is a batch of 2 sequences, each with 10 tokens. The model outputs start and end logits for answer spans. What is the shape of the output logits?

NLP
import torch
batch_size = 2
seq_len = 10
start_logits = torch.randn(batch_size, seq_len)
end_logits = torch.randn(batch_size, seq_len)
print(start_logits.shape, end_logits.shape)
A(2, 10) (2, 10)
B(10, 2) (10, 2)
C(2, 1) (2, 1)
D(20,) (20,)
Attempts:
2 left
💡 Hint

Think about how logits correspond to tokens in each sequence for each batch item.

Hyperparameter
advanced
2:00remaining
Effect of number of attention heads in transformer models

What is the main effect of increasing the number of attention heads in a transformer model?

AIt allows the model to focus on different parts of the input simultaneously, improving its ability to capture diverse relationships.
BIt reduces the model size and speeds up training by using fewer parameters.
CIt disables the self-attention mechanism and uses only feed-forward layers.
DIt changes the output format from text to images.
Attempts:
2 left
💡 Hint

Think about what multiple attention heads do in the transformer.

🔧 Debug
expert
3:00remaining
Why does this transformer model output all zeros?

You fine-tuned a transformer for text classification, but the model always outputs zeros for predictions. What is the most likely cause?

import torch
from transformers import BertForSequenceClassification, BertTokenizer

model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

inputs = tokenizer('Hello world', return_tensors='pt')
outputs = model(**inputs)
print(outputs.logits)
AThe tokenizer is incorrect and returns empty input tensors.
BThe model is in evaluation mode but was never trained, so outputs are near zero logits.
CThe model architecture is wrong and does not produce logits.
DThe input text is too short, so the model outputs zeros.
Attempts:
2 left
💡 Hint

Think about what happens if you use a pretrained model without fine-tuning for classification.