Complete the code to select the device for inference based on availability.
device = 'cuda' if torch.cuda.is_available() else [1]
When GPU is not available, the CPU is used for inference.
Complete the code to set batch size for CPU inference to avoid overload.
batch_size = [1] if device == 'cpu' else 64
Smaller batch sizes like 16 help prevent CPU overload during inference.
Fix the error in the code that measures inference time on CPU.
start = time.time()
output = model(input.to([1]))
end = time.time()Input tensor must be moved to CPU device for CPU inference.
Fill both blanks to create a dictionary showing inference speed tradeoffs.
inference_speed = {'CPU': [1], 'GPU': [2] # in millisecondsCPU inference is slower (100 ms) compared to GPU (5 ms) for the same model.
Fill all three blanks to filter models suitable for CPU inference with low memory.
suitable_models = {m: mem for m, mem in models.items() if mem [1] 4 and 'light' [2] m and mem [3] 1}We select models with memory less or equal 4GB, name containing 'light', and memory greater or equal 1GB.
