Computer Visionml~5 mins

Model optimization (pruning, quantization) in Computer Vision

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Model optimization helps make AI models smaller and faster without losing much accuracy. This is useful to run models on devices like phones or cameras.

When you want your computer vision model to run faster on a smartphone.

When you need to save memory or storage space for your AI model.

When deploying models to devices with limited power or hardware.

When you want to reduce the cost of running AI models in the cloud.

When you want to improve the speed of real-time image or video processing.

Syntax

Computer Vision

import tensorflow_model_optimization as tfmot
import tensorflow as tf

# Pruning example
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Quantization example
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

Pruning removes less important parts of the model to make it smaller.

Quantization reduces the precision of numbers to make the model faster and smaller.

Examples

This example shows how to apply pruning to a Keras model with a schedule that gradually prunes 50% of weights.

Computer Vision

import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

pruning_params = {
  'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
      initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000)
}

model_for_pruning = prune_low_magnitude(model, **pruning_params)

This example converts a Keras model to a TensorFlow Lite model with default quantization for smaller size and faster inference.

Computer Vision

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

Sample Model

This program trains a simple CNN on MNIST digits, applies pruning to remove 50% of weights, then converts the pruned model to a quantized TensorFlow Lite model. It prints the accuracy before and after pruning and the size of the quantized model.

Computer Vision

import tensorflow as tf
import tensorflow_model_optimization as tfmot
from tensorflow.keras import layers, models

# Create a simple CNN model for computer vision
model = models.Sequential([
    layers.InputLayer(input_shape=(28, 28, 1)),
    layers.Conv2D(16, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Load MNIST data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[..., None] / 255.0
x_test = x_test[..., None] / 255.0

# Train the model briefly
model.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0)

# Apply pruning
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
        initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000)
}
model_for_pruning = prune_low_magnitude(model, **pruning_params)

model_for_pruning.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train pruned model briefly
model_for_pruning.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0)

# Strip pruning wrappers to get final pruned model
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

# Convert to quantized TFLite model
converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

# Evaluate original model accuracy
loss, acc = model.evaluate(x_test, y_test, verbose=0)

print(f'Original model accuracy: {acc:.4f}')

# Evaluate pruned model accuracy
loss_p, acc_p = model_for_export.evaluate(x_test, y_test, verbose=0)
print(f'Pruned model accuracy: {acc_p:.4f}')

print(f'Quantized TFLite model size: {len(quantized_tflite_model) / 1024:.2f} KB')

OutputSuccess

Important Notes

Pruning works best when you retrain the model after pruning to recover accuracy.

Quantization can slightly reduce accuracy but greatly improves speed and size.

Always test your optimized model to ensure it still meets your accuracy needs.

Summary

Model optimization makes AI models smaller and faster for real devices.

Pruning removes less important parts of the model to reduce size.

Quantization reduces number precision to speed up the model and save space.

Practice

(1/5)

1. What is the main goal of model pruning in computer vision?

easy

A. To remove less important parts of the model to reduce size

B. To increase the number of layers in the model

C. To add more training data for better accuracy

D. To convert the model to a different programming language

Model optimization (pruning, quantization) in Computer Vision

Start learning this pattern below

Practice

Solution

Step 1: Understand pruning concept

Step 2: Identify pruning goal

Final Answer:

Quick Check:

Solution

Step 1: Identify quantization syntax

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Calculate total weights

Step 2: Calculate remaining weights after pruning

Step 3: Understand pruning method

Step 4: Check print output

Final Answer:

Quick Check:

Solution

Step 1: Understand the error

Step 2: Identify cause

Final Answer:

Quick Check:

Solution

Step 1: Understand device constraints

Step 2: Choose optimization techniques

Step 3: Combine pruning and quantization

Final Answer:

Quick Check: