0
0
NLPml~20 mins

Model optimization (distillation, quantization) in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Model Optimization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding Model Distillation Purpose

What is the main goal of model distillation in NLP?

ATo train a smaller model to mimic a larger model's behavior
BTo increase the number of parameters in the model for better accuracy
CTo add noise to the training data to improve robustness
DTo convert a model into a rule-based system
Attempts:
2 left
💡 Hint

Think about how a big model can help a smaller one learn.

Predict Output
intermediate
2:00remaining
Output of Quantization Code Snippet

What is the output shape of the quantized model's embedding layer weights after applying 8-bit quantization?

NLP
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(1000, 64)

model = SimpleModel()
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.Embedding}, dtype=torch.qint8
)
weight_shape = quantized_model.embedding.weight.shape
print(weight_shape)
AAttributeError
B(64, 1000)
C(1000, 32)
D(1000, 64)
Attempts:
2 left
💡 Hint

Quantization changes data type but not tensor shape.

Model Choice
advanced
2:00remaining
Choosing a Model for Distillation

You want to distill a large BERT model into a smaller one for mobile deployment. Which student model architecture is best suited?

AA recurrent neural network with LSTM cells
BA convolutional neural network designed for images
CA smaller BERT model with fewer layers and parameters
DA large transformer model with more layers than the teacher
Attempts:
2 left
💡 Hint

Student model should be similar but smaller than the teacher.

Hyperparameter
advanced
1:30remaining
Key Hyperparameter in Quantization

Which hyperparameter is critical to set correctly when applying post-training quantization to an NLP model?

AThe learning rate of the optimizer
BThe number of bits used to represent weights and activations
CThe dropout rate in the model layers
DThe batch size during training
Attempts:
2 left
💡 Hint

Quantization precision depends on this setting.

Metrics
expert
2:30remaining
Evaluating Distilled Model Performance

After distilling a large NLP model, which metric best shows if the smaller model retained the teacher's knowledge effectively?

AAccuracy on a held-out validation set compared to the teacher model
BTraining loss of the student model on the training data
CNumber of parameters in the student model
DInference time on a GPU
Attempts:
2 left
💡 Hint

Think about measuring how well the student predicts compared to the teacher.