Model optimization helps make AI models smaller and faster without losing much accuracy. This is useful to run models on devices like phones or cameras.
Model optimization (pruning, quantization) in Computer Vision
Start learning this pattern below
Jump into concepts and practice - no test required
import tensorflow_model_optimization as tfmot import tensorflow as tf # Pruning example prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude # Quantization example converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()
Pruning removes less important parts of the model to make it smaller.
Quantization reduces the precision of numbers to make the model faster and smaller.
import tensorflow_model_optimization as tfmot prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000) } model_for_pruning = prune_low_magnitude(model, **pruning_params)
import tensorflow as tf converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()
This program trains a simple CNN on MNIST digits, applies pruning to remove 50% of weights, then converts the pruned model to a quantized TensorFlow Lite model. It prints the accuracy before and after pruning and the size of the quantized model.
import tensorflow as tf import tensorflow_model_optimization as tfmot from tensorflow.keras import layers, models # Create a simple CNN model for computer vision model = models.Sequential([ layers.InputLayer(input_shape=(28, 28, 1)), layers.Conv2D(16, 3, activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Load MNIST data (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train = x_train[..., None] / 255.0 x_test = x_test[..., None] / 255.0 # Train the model briefly model.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0) # Apply pruning prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000) } model_for_pruning = prune_low_magnitude(model, **pruning_params) model_for_pruning.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train pruned model briefly model_for_pruning.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0) # Strip pruning wrappers to get final pruned model model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning) # Convert to quantized TFLite model converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert() # Evaluate original model accuracy loss, acc = model.evaluate(x_test, y_test, verbose=0) print(f'Original model accuracy: {acc:.4f}') # Evaluate pruned model accuracy loss_p, acc_p = model_for_export.evaluate(x_test, y_test, verbose=0) print(f'Pruned model accuracy: {acc_p:.4f}') print(f'Quantized TFLite model size: {len(quantized_tflite_model) / 1024:.2f} KB')
Pruning works best when you retrain the model after pruning to recover accuracy.
Quantization can slightly reduce accuracy but greatly improves speed and size.
Always test your optimized model to ensure it still meets your accuracy needs.
Model optimization makes AI models smaller and faster for real devices.
Pruning removes less important parts of the model to reduce size.
Quantization reduces number precision to speed up the model and save space.
Practice
model pruning in computer vision?Solution
Step 1: Understand pruning concept
Pruning means removing parts of the model that contribute less to its output.Step 2: Identify pruning goal
The goal is to reduce model size and speed up inference by cutting unnecessary parts.Final Answer:
To remove less important parts of the model to reduce size -> Option AQuick Check:
Pruning = Remove less important parts [OK]
- Thinking pruning adds layers instead of removing
- Confusing pruning with data augmentation
- Believing pruning changes programming language
Solution
Step 1: Identify quantization syntax
In TensorFlow Lite, quantization is enabled by setting converter.optimizations to Optimize.DEFAULT.Step 2: Check other options
model = tf.lite.TFLiteConverter.from_keras_model(model).convert() converts model but does not enable quantization. Options B and C are training commands, not quantization.Final Answer:
converter.optimizations = [tf.lite.Optimize.DEFAULT] -> Option BQuick Check:
Quantization flag = converter.optimizations [OK]
- Confusing model conversion with quantization
- Using training commands instead of conversion flags
- Missing the optimization setting for quantization
import torch
import torch.nn.utils.prune as prune
model = torch.nn.Sequential(
torch.nn.Linear(100, 50),
torch.nn.ReLU()
)
prune.l1_unstructured(model[0], name='weight', amount=0.2)
pruned_weights = model[0].weight
print((pruned_weights != 0).sum().item())Solution
Step 1: Calculate total weights
The first linear layer has 100 inputs and 50 outputs, so total weights = 100 * 50 = 5000.Step 2: Calculate remaining weights after pruning
Pruning 20% removes 20% of weights, so remaining weights = 80% of 5000 = 4000.Step 3: Understand pruning method
PyTorch's l1_unstructured pruning does not remove weights but masks them, so the weight tensor size remains 5000, but the number of non-zero weights is 4000.Step 4: Check print output
The print statement counts non-zero weights, so output is 4000.Final Answer:
4000 -> Option DQuick Check:
5000 * 0.8 = 4000 [OK]
- Calculating total weights incorrectly
- Using pruning amount as remaining instead of removed
- Confusing layer input/output dimensions
AttributeError: 'TFLiteConverter' object has no attribute 'optimizations'. What is the likely cause?Solution
Step 1: Understand the error
The error says the converter object lacks 'optimizations' attribute, meaning the TensorFlow version is old.Step 2: Identify cause
Older TensorFlow versions do not support the 'optimizations' attribute needed for quantization.Final Answer:
Using an outdated TensorFlow version without quantization support -> Option CQuick Check:
Missing attribute = outdated TensorFlow [OK]
- Assuming model size causes attribute error
- Thinking quantization needs retraining always
- Believing model without weights causes this error
Solution
Step 1: Understand device constraints
Mobile devices have limited memory and CPU, so model size and speed matter.Step 2: Choose optimization techniques
Pruning removes unnecessary weights reducing size; quantization reduces number precision speeding inference.Step 3: Combine pruning and quantization
Using both together reduces size and speeds up model with minimal accuracy loss.Final Answer:
Apply pruning to remove unimportant weights, then quantize weights to 8-bit integers -> Option AQuick Check:
Pruning + quantization = smaller, faster model [OK]
- Only increasing layers without optimization
- Ignoring quantization benefits
- Assuming full precision is always best for deployment
