Recall & Review
beginner
What is the main purpose of torchvision detection models?
Torchvision detection models are used to find and locate objects within images. They can tell what objects are present and where they are by drawing boxes around them.
Click to reveal answer
beginner
Name two popular torchvision detection models.
Two popular torchvision detection models are Faster R-CNN and SSD (Single Shot MultiBox Detector). Both are used for detecting objects but differ in speed and accuracy.
Click to reveal answer
beginner
How do you load a pretrained Faster R-CNN model from torchvision?
You can load it using: <br><code>import torchvision.models.detection as detection<br>model = detection.fasterrcnn_resnet50_fpn(pretrained=True)</code><br>This loads a model trained on COCO dataset ready to detect common objects.Click to reveal answer
intermediate
What kind of output do torchvision detection models produce?
They output a list of dictionaries, each dictionary contains:<br>- boxes: coordinates of detected objects<br>- labels: class IDs of objects<br>- scores: confidence scores for each detectionClick to reveal answer
intermediate
Why is it important to set the model to evaluation mode during inference?
Setting the model to evaluation mode (
model.eval()) disables training behaviors like dropout and batch normalization updates, ensuring consistent and correct predictions.Click to reveal answer
Which torchvision detection model is known for balancing speed and accuracy well?
✗ Incorrect
Faster R-CNN is a popular detection model that balances speed and accuracy well. ResNet-50, VGG16, and AlexNet are classification models.
What does the 'boxes' output from a detection model represent?
✗ Incorrect
The 'boxes' output contains the coordinates (usually x_min, y_min, x_max, y_max) of each detected object in the image.
How do you prepare a torchvision detection model for inference?
✗ Incorrect
Calling model.eval() sets the model to evaluation mode, which is necessary for correct inference.
Which dataset is commonly used to pretrain torchvision detection models?
✗ Incorrect
COCO (Common Objects in Context) is a large dataset used for object detection and is commonly used to pretrain detection models.
What does the 'scores' output from a detection model indicate?
✗ Incorrect
The 'scores' represent the confidence level that the detected object belongs to the predicted class.
Explain how to use a pretrained torchvision detection model to detect objects in a new image.
Think about the steps from loading the model to getting predictions.
You got /5 concepts.
Describe the structure of the output from torchvision detection models and what each part means.
Focus on what the model tells you about each detected object.
You got /5 concepts.