0
0
NLPml~10 mins

Model optimization (distillation, quantization) in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load a pre-trained model for distillation.

NLP
from transformers import DistilBertForSequenceClassification
model = DistilBertForSequenceClassification.from_pretrained([1])
Drag options to blanks, or click blank then click option'
A"distilbert-base-uncased"
B"bert-base-uncased"
C"gpt2"
D"roberta-base"
Attempts:
3 left
💡 Hint
Common Mistakes
Using the full BERT model name instead of the distilled version.
Choosing a model from a different architecture like GPT-2.
2fill in blank
medium

Complete the code to apply dynamic quantization to a PyTorch model.

NLP
import torch
model = torch.quantization.quantize_dynamic(model, [1], dtype=torch.qint8)
Drag options to blanks, or click blank then click option'
A[torch.nn.LSTM]
B[torch.nn.Linear]
C[torch.nn.Conv2d]
D[torch.nn.ReLU]
Attempts:
3 left
💡 Hint
Common Mistakes
Trying to quantize activation functions like ReLU.
Using convolution layers which are less common in NLP models.
3fill in blank
hard

Fix the error in the code to correctly perform knowledge distillation training.

NLP
teacher_outputs = teacher_model(input_ids)
student_outputs = student_model(input_ids)
loss = distillation_loss(student_outputs, teacher_outputs[1])
Drag options to blanks, or click blank then click option'
A.attentions
B.hidden_states
C.logits
D.labels
Attempts:
3 left
💡 Hint
Common Mistakes
Using hidden states or attention outputs instead of logits.
Trying to access labels from model outputs.
4fill in blank
hard

Fill both blanks to create a quantized model and prepare it for inference.

NLP
import torch.quantization
quantized_model = torch.quantization.quantize_dynamic(model, [1], dtype=[2])
Drag options to blanks, or click blank then click option'
A[torch.nn.Linear]
Btorch.qint8
Ctorch.float16
D[torch.nn.Conv2d]
Attempts:
3 left
💡 Hint
Common Mistakes
Using float16 which is not dynamic quantization dtype.
Trying to quantize convolution layers in NLP models.
5fill in blank
hard

Fill all three blanks to define a distillation loss combining student and teacher outputs.

NLP
import torch.nn.functional as F
alpha = 0.5
T = 2.0
loss = alpha * F.kl_div(F.log_softmax(student_outputs[1] / T, dim=1), F.softmax(teacher_outputs[2] / T, dim=1), reduction='batchmean') * (T * T) + (1 - alpha) * F.cross_entropy(student_outputs[3], labels)
Drag options to blanks, or click blank then click option'
A.logits
D.hidden_states
Attempts:
3 left
💡 Hint
Common Mistakes
Using hidden states instead of logits.
Mixing up student and teacher outputs.