In Single Shot MultiBox Detector (SSD), why does the model use multiple feature maps at different scales for detection?
Think about how objects in images can be small or large and how the model might handle that.
SSD uses multiple feature maps at different scales to detect objects of various sizes. Smaller feature maps capture larger objects, while larger feature maps capture smaller objects, allowing the model to detect objects effectively across scales.
Given an SSD prediction layer output tensor shape of (batch_size, 38, 38, 4 * (num_classes + 4)), what does the '4' represent in this context?
Think about how many boxes the model predicts at each location on the feature map.
The '4' represents the number of default boxes (also called anchor boxes) assigned to each location on the feature map. Each default box predicts class scores and bounding box offsets.
Why is it important to select multiple aspect ratios for default boxes in SSD?
Consider how objects in real life come in different shapes, not just one fixed shape.
Using multiple aspect ratios for default boxes allows SSD to better cover the variety of object shapes in images, which improves detection accuracy by matching boxes more closely to object shapes.
SSD uses a combined loss function with localization loss and confidence loss. What does the localization loss measure?
Think about what 'localization' means in object detection.
Localization loss measures how close the predicted bounding boxes are to the actual object boxes, typically using smooth L1 loss on box coordinates.
A developer notices that their SSD model performs poorly on small objects. Which of the following is the most likely cause?
Consider how SSD detects small objects using feature maps.
Small objects require higher resolution feature maps for detection. Using only the last (lowest resolution) feature map reduces the model's ability to detect small objects effectively.