What is the key difference between the original R-CNN and Fast R-CNN in how they process images for object detection?
Think about how many times the CNN runs on the image in each method.
R-CNN runs the CNN separately on each region proposal, which is slow. Fast R-CNN runs the CNN once on the whole image and then extracts features for each region, making it faster.
You want to build a real-time object detection system on a mobile device with limited computing power. Which R-CNN family model is the best choice?
Consider which model introduced a faster region proposal method.
Faster R-CNN introduced a Region Proposal Network (RPN) that speeds up detection, making it more suitable for real-time applications compared to R-CNN and Fast R-CNN.
Given an input image of size 224x224x3 passed through a backbone CNN that reduces spatial dimensions by a factor of 16, what is the shape of the output feature map?
import torch import torch.nn as nn input_tensor = torch.randn(1, 3, 224, 224) class SimpleBackbone(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 256, kernel_size=3, stride=2, padding=1) self.pool = nn.MaxPool2d(2) def forward(self, x): x = self.conv(x) # halves size x = self.pool(x) # halves size again x = self.pool(x) # halves size again return x model = SimpleBackbone() output = model(input_tensor) output.shape
Each stride 2 or pooling halves the spatial size.
Starting from 224x224, conv with stride 2 halves to 112x112, then first pool halves to 56x56, second pool halves to 28x28. So output shape is (1, 256, 28, 28).
But the code applies pooling twice after conv, so final size is 28x28.
Mask R-CNN outputs a mask for each detected object. Which metric best measures how well the predicted mask matches the true mask?
Think about overlap between predicted and true masks.
IoU measures the overlap between predicted and true masks, making it the best metric for segmentation quality.
You notice that training Faster R-CNN on your dataset is very slow. You suspect the bottleneck is in the region proposal network (RPN). Which of the following changes will most likely speed up training without hurting accuracy?
Think about how input size affects computation.
Reducing input image size reduces computation in the backbone and RPN, speeding up training without necessarily hurting accuracy much.
Increasing anchors or epochs increases computation. Removing RPN loses the speed advantage of Faster R-CNN.