YOLO is a popular model in computer vision. What is its main goal?
Think about what YOLO does differently compared to traditional object detectors.
YOLO stands for 'You Only Look Once'. It predicts bounding boxes and class probabilities in a single forward pass, making it fast and efficient for object detection.
YOLO architecture includes several key parts. Which one below is NOT part of it?
YOLO does not use a separate step to propose regions.
Unlike some detectors, YOLO does not use a region proposal network. It predicts bounding boxes directly from the image divided into a grid.
YOLO divides the image into a 7x7 grid. Each grid cell predicts 2 bounding boxes and class probabilities for 20 classes. What is the shape of the output tensor?
grid_size = 7 boxes_per_cell = 2 num_classes = 20 output_shape = (grid_size, grid_size, boxes_per_cell * 5 + num_classes) print(output_shape)
Each box predicts 5 values: 4 for coordinates and 1 for confidence.
Each grid cell predicts 2 boxes, each with 5 values (x, y, w, h, confidence), plus 20 class probabilities. So total per cell: 2*5 + 20 = 30.
YOLO detects objects with bounding boxes and class labels. Which metric best measures its detection quality?
Think about a metric that considers both localization and classification.
Mean Average Precision (mAP) measures how well the model detects and classifies objects, considering both bounding box overlap and class correctness.
Consider this Python code snippet that processes YOLO output tensor. What error will it raise?
import numpy as np output = np.zeros((7,7,30)) for i in range(7): for j in range(7): boxes = output[i,j,:10].reshape(2,5) scores = output[i,j,10:30] max_score_index = np.argmax(scores) print(f"Max score index: {max_score_index}")
Check the shapes and slicing carefully.
The output slice [:10] reshapes into (2,5) correctly. The scores slice [10:30] is 20 elements, matching 20 classes. No errors occur.