Practice

(1/5)

1. What is the main goal of model distillation in NLP?

easy

A. To increase the number of layers in a neural network

B. To add more training data for better accuracy

C. To convert text data into numerical vectors

D. To train a smaller model to mimic a larger model's behavior

Solution

Step 1: Understand model distillation concept
Model distillation is about making a smaller model learn from a bigger, well-trained model.
Step 2: Identify the goal of distillation
The goal is to keep performance while reducing model size and complexity.
Final Answer:
To train a smaller model to mimic a larger model's behavior -> Option D
Quick Check:
Distillation = smaller model copies bigger model [OK]

Hint: Distillation means small model learns from big model [OK]

Common Mistakes:

Confusing distillation with adding layers
Thinking distillation increases data size
Mixing distillation with data preprocessing

2. Which of the following is the correct way to apply quantization to a model's weights in Python using PyTorch?

easy

A. model.quantize(weights=True)

B. torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)

C. torch.quantize(model, dtype=torch.float32)

D. torch.quantization(model, dtype=torch.int32)

Solution

Step 1: Recall PyTorch quantization syntax
PyTorch uses torch.quantization.quantize_dynamic for dynamic quantization on layers like Linear.
Step 2: Check correct function and parameters
torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) correctly calls quantize_dynamic with model, target layers, and dtype torch.qint8.
Final Answer:
torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) -> Option B
Quick Check:
PyTorch quantize_dynamic with Linear and qint8 = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) [OK]

Hint: Use torch.quantization.quantize_dynamic for quantization [OK]

Common Mistakes:

Using non-existent torch.quantize function
Passing wrong dtype like float32 instead of qint8
Calling quantization as a model method

3. Given the following code snippet for distillation, what will be the output loss value if the student model perfectly mimics the teacher model's outputs?

teacher_outputs = torch.tensor([0.1, 0.9])
student_outputs = torch.tensor([0.1, 0.9])
loss_fn = torch.nn.MSELoss()
loss = loss_fn(student_outputs, teacher_outputs)
print(loss.item())

medium

A. 0.0

B. 0.5

C. 1.0

D. Cannot compute due to shape mismatch

Solution

Step 1: Understand MSELoss calculation
MSELoss calculates mean squared error between student and teacher outputs.
Step 2: Calculate loss for identical outputs
Since student_outputs equals teacher_outputs, difference is zero, so loss is 0.0.
Final Answer:
0.0 -> Option A
Quick Check:
Identical outputs give zero MSE loss [OK]

Hint: Same outputs mean zero loss in MSE [OK]

Common Mistakes:

Assuming loss is 1.0 by default
Confusing loss with accuracy
Thinking shape mismatch error occurs

4. You tried to quantize a model but got an error: AttributeError: 'MyModel' object has no attribute 'quantize'. What is the likely cause?

medium

A. The model class does not have a built-in quantize method

B. You forgot to import torch

C. Quantization only works on CPU, not GPU

D. The model is already quantized

Solution

Step 1: Analyze the error message
The error says the model object lacks a 'quantize' method, meaning it is not defined.
Step 2: Understand quantization usage
Quantization is applied via PyTorch functions, not as a model method, so calling model.quantize() causes error.
Final Answer:
The model class does not have a built-in quantize method -> Option A
Quick Check:
Quantize is a function, not a model method [OK]

Hint: Quantize via torch functions, not model methods [OK]

Common Mistakes:

Trying to call quantize as model.quantize()
Ignoring import errors
Assuming quantization only works on CPU

5. You want to deploy a chatbot on a mobile device with limited memory and CPU. Which combination of model optimization techniques is best to reduce size and speed up inference without losing much accuracy?

hard

A. Use quantization first, then retrain the large model from scratch

B. Only increase the training data size to improve accuracy

C. Use distillation to train a smaller model, then apply quantization to reduce precision

D. Add more layers to the model and use float64 precision

Solution

Step 1: Identify constraints and goals
Mobile devices need small, fast models with good accuracy.
Step 2: Choose suitable optimization techniques
Distillation creates a smaller model; quantization reduces number precision to save space and speed up inference.
Step 3: Combine techniques for best effect
Using distillation first then quantization is a common, effective approach.
Final Answer:
Use distillation to train a smaller model, then apply quantization to reduce precision -> Option C
Quick Check:
Distillation + quantization = small, fast, accurate model [OK]

Hint: Distill first, then quantize for mobile deployment [OK]

Common Mistakes:

Ignoring quantization for mobile
Adding layers increases size and slows down
Retraining large model after quantization wastes effort

Why Model optimization (distillation, quantization) in NLP? - Purpose & Use Cases

Start learning this pattern below

Practice

Solution

Step 1: Understand model distillation concept

Step 2: Identify the goal of distillation

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch quantization syntax

Step 2: Check correct function and parameters

Final Answer:

Quick Check:

Solution

Step 1: Understand MSELoss calculation

Step 2: Calculate loss for identical outputs

Final Answer:

Quick Check:

Solution

Step 1: Analyze the error message

Step 2: Understand quantization usage

Final Answer:

Quick Check:

Solution

Step 1: Identify constraints and goals

Step 2: Choose suitable optimization techniques

Step 3: Combine techniques for best effect

Final Answer:

Quick Check: