Object detection models output bounding boxes around objects. Why is this localization important?
Think about how knowing where an object is helps in real life, like finding your keys on a table.
Bounding boxes tell the model where each object is located and how big it is. This helps in tasks like counting objects or tracking them.
Choose the model architecture that outputs both object classes and bounding box coordinates.
Look for a model known for fast detection and localization in images.
YOLO predicts bounding boxes and class probabilities in one pass, making it efficient for detection and localization.
Given a batch of 4 images, each with 3 predicted bounding boxes, and each box represented by 4 coordinates, what is the shape of the output tensor?
import torch batch_size = 4 num_boxes = 3 coords_per_box = 4 output = torch.randn(batch_size, num_boxes, coords_per_box) print(output.shape)
Think about batch size first, then number of boxes, then coordinates per box.
The output tensor shape is (batch_size, number_of_boxes, coordinates_per_box), so (4, 3, 4) here.
In object detection, which metric quantifies the overlap between predicted and true bounding boxes?
It compares the area of overlap to the area of union between two boxes.
IoU measures the ratio of the overlapping area to the combined area of predicted and true boxes, indicating localization quality.
Given this PyTorch snippet, why might the model fail to localize objects properly?
import torch
pred_boxes = torch.tensor([[0.1, 0.2, 0.3, 0.4],
[0.5, 0.6, 0.7, 0.8]])
gt_boxes = torch.tensor([[0.15, 0.25, 0.35, 0.45],
[0.55, 0.65, 0.75, 0.85]])
loss = torch.nn.functional.mse_loss(pred_boxes, gt_boxes)
print(loss.item())What is a likely reason the model's localization is poor despite low loss?
Think about how loss functions treat bounding box coordinates and what they miss.
MSE loss treats each coordinate independently and does not consider box overlap or scale, which can lead to poor localization despite low loss values.