Bird
Raised Fist0
MLOpsdevops~20 mins

Model optimization for serving (quantization, pruning) in MLOps - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Model Optimization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Quantization Impact

Which of the following best describes the main benefit of quantization in model serving?

AIt increases model accuracy by adding more layers.
BIt removes unnecessary neurons to simplify the model architecture.
CIt converts the model to a different programming language for compatibility.
DIt reduces model size and speeds up inference by using lower precision numbers.
Attempts:
2 left
💡 Hint

Think about how using smaller numbers affects memory and speed.

💻 Command Output
intermediate
2:00remaining
Output of Pruning Command

What is the expected output after running this pruning command on a TensorFlow model?

MLOps
tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000))
AA pruned model with approximately 50% of weights set to zero after 1000 steps.
BA model converted to 8-bit integers for faster inference.
CAn error because pruning requires a different API call.
DA model with all weights doubled in magnitude.
Attempts:
2 left
💡 Hint

Pruning gradually removes weights by setting them to zero based on magnitude.

🔀 Workflow
advanced
3:00remaining
Correct Sequence for Model Quantization Workflow

Arrange the steps in the correct order to perform post-training quantization for a TensorFlow model.

A2,1,3,4
B1,2,3,4
C1,3,2,4
D1,2,4,3
Attempts:
2 left
💡 Hint

Think about loading first, then converting, saving, and finally testing.

Troubleshoot
advanced
2:00remaining
Troubleshooting Accuracy Drop After Pruning

After pruning a model, you notice a significant drop in accuracy. Which option is the most likely cause?

AThe pruning API was not called, so the model was unchanged.
BThe model was quantized instead of pruned, causing precision loss.
CThe pruning sparsity was set too high too quickly, removing important weights.
DThe model was trained with too many epochs before pruning.
Attempts:
2 left
💡 Hint

Consider how pruning speed and amount affect model quality.

Best Practice
expert
3:00remaining
Best Practice for Combining Quantization and Pruning

Which practice is recommended when combining pruning and quantization to optimize a model for serving?

AFirst prune the model to reduce weights, then fine-tune it, and finally apply quantization.
BApply quantization first, then prune the quantized model without retraining.
CPrune and quantize simultaneously without any fine-tuning steps.
DOnly prune the model; quantization is not compatible with pruning.
Attempts:
2 left
💡 Hint

Think about the order that preserves accuracy and model size reduction.

Practice

(1/5)
1. What is the main goal of quantization in model optimization for serving?
easy
A. Increase the size of the model for better performance
B. Reduce the precision of numbers to make the model smaller and faster
C. Add more neurons to improve accuracy
D. Remove entire layers from the model to simplify it

Solution

  1. Step 1: Understand quantization purpose

    Quantization reduces the number precision (like from 32-bit to 8-bit) to save memory and speed up computation.
  2. Step 2: Compare options

    Removing layers is pruning, adding neurons increases size, increasing size is opposite of optimization.
  3. Final Answer:

    Reduce the precision of numbers to make the model smaller and faster -> Option B
  4. Quick Check:

    Quantization = Reduce precision [OK]
Hint: Quantization means lowering number precision to save space [OK]
Common Mistakes:
  • Confusing pruning with quantization
  • Thinking quantization adds complexity
  • Believing quantization increases model size
2. Which of the following is the correct syntax to apply pruning using TensorFlow Model Optimization API in Python?
easy
A. pruned_model = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule)
B. pruned_model = tf.prune_low_magnitude(model, schedule=pruning_schedule)
C. pruned_model = tfmot.prune_low_magnitude(model, pruning_schedule=pruning_schedule)
D. pruned_model = tfmot.sparsity.prune_low_magnitude(model, pruning_schedule)

Solution

  1. Step 1: Recall TensorFlow pruning API structure

    The pruning function is under tfmot.sparsity.keras and requires the pruning_schedule argument.
  2. Step 2: Check syntax correctness

    pruned_model = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule) matches the correct full path and argument names. Others miss parts or have wrong argument names.
  3. Final Answer:

    pruned_model = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule) -> Option A
  4. Quick Check:

    Correct pruning syntax = pruned_model = tfmot.sparsity.keras.prune_low_magnitude(model, pruning_schedule=pruning_schedule) [OK]
Hint: TensorFlow pruning is under tfmot.sparsity.keras with pruning_schedule [OK]
Common Mistakes:
  • Omitting 'keras' in the API path
  • Using wrong argument names
  • Calling pruning directly from tf module
3. Given the following PyTorch code snippet for quantization, what will be the output type of the model's weights after applying dynamic quantization?
import torch
import torch.nn as nn

model = nn.Linear(10, 5)
quantized_model = torch.quantization.quantize_dynamic(model, {nn.Linear}, dtype=torch.qint8)
print(type(quantized_model.weight()))
medium
A. TypeError: 'weight' is not callable
B.
C. AttributeError: 'Linear' object has no attribute 'weight'
D.

Solution

  1. Step 1: Analyze dynamic quantization effect

    torch.quantization.quantize_dynamic converts nn.Linear to torch.nn.quantized.dynamic.Linear, where weight is a method returning dequantized weights as torch.Tensor.
  2. Step 2: Trace the print statement

    quantized_model.weight() succeeds, returning a torch.Tensor (fp32 dequantized), so print(type(...)) outputs <class 'torch.Tensor'>.
  3. Final Answer:

    <class 'torch.Tensor'> -> Option D
  4. Quick Check:

    Dynamic quant: weight() returns Tensor [OK]
Hint: Dynamic quantization makes weight() callable returning Tensor [OK]
Common Mistakes:
  • Thinking weight remains non-callable attribute like original Linear
  • Confusing quantized_model type with weight type
  • Expecting error on quantized model weight access
4. You tried pruning a TensorFlow model but got an error: AttributeError: module 'tensorflow_model_optimization' has no attribute 'sparsity'. What is the most likely cause?
medium
A. The tensorflow_model_optimization package is not installed
B. You used the wrong pruning schedule argument
C. You forgot to import tensorflow_model_optimization as tfmot
D. Pruning is not supported in TensorFlow

Solution

  1. Step 1: Understand the error message

    The error says the module has no attribute 'sparsity', which usually means the package is missing or outdated.
  2. Step 2: Check common causes

    If the package is not installed, Python cannot find the 'sparsity' submodule. Importing incorrectly or wrong argument causes different errors.
  3. Final Answer:

    The tensorflow_model_optimization package is not installed -> Option A
  4. Quick Check:

    Missing package = AttributeError [OK]
Hint: Missing package causes AttributeError on submodules [OK]
Common Mistakes:
  • Assuming import alias causes error
  • Blaming pruning schedule argument
  • Thinking pruning unsupported in TensorFlow
5. You want to optimize a large deep learning model for mobile deployment by combining pruning and quantization. Which sequence of steps is best to minimize model size and maintain accuracy?
hard
A. Apply quantization first, then prune the model to remove weights
B. Train the model with quantization-aware training, then prune after deployment
C. First prune the model to remove unimportant weights, then apply quantization to reduce number precision
D. Only prune the model; quantization is not compatible with pruning

Solution

  1. Step 1: Understand pruning and quantization order

    Pruning removes unimportant weights first, reducing model size and complexity.
  2. Step 2: Apply quantization after pruning

    Quantization then reduces number precision on the smaller pruned model, further shrinking size and speeding inference.
  3. Final Answer:

    First prune the model to remove unimportant weights, then apply quantization to reduce number precision -> Option C
  4. Quick Check:

    Prune first, then quantize = First prune the model to remove unimportant weights, then apply quantization to reduce number precision [OK]
Hint: Prune first to shrink, then quantize to compress numbers [OK]
Common Mistakes:
  • Quantizing before pruning reduces pruning effectiveness
  • Thinking pruning and quantization cannot be combined
  • Pruning after deployment is too late