Object detection finds where things are in an image. Localization means drawing boxes around those things so we know their exact place.
0
0
Why detection localizes objects in PyTorch
Introduction
When you want to find and locate cars in a street photo.
When you need to detect and mark faces in a group picture.
When robots must identify and pick objects from a shelf.
When tracking animals in wildlife videos to know their positions.
Syntax
PyTorch
import torch.nn as nn class ObjectDetector(nn.Module): def __init__(self): super().__init__() # model layers here def forward(self, x): # returns bounding boxes and class scores return boxes, scores
The model outputs both boxes (coordinates) and class scores.
Localization means predicting the box coordinates around objects.
Examples
The model returns boxes and scores for each image in the batch.
PyTorch
boxes, scores = model(images) # boxes shape: [batch_size, num_boxes, 4] # scores shape: [batch_size, num_boxes, num_classes]
Filter boxes with confidence above 0.5 and print their coordinates and scores.
PyTorch
for box, score in zip(boxes[0], scores[0]): if score.max() > 0.5: print(f"Box: {box}, Score: {score}")
Sample Model
This simple model predicts one box and class scores for one image. It shows how detection localizes objects by outputting box coordinates.
PyTorch
import torch import torch.nn as nn class SimpleDetector(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 16, 3, padding=1) self.pool = nn.AdaptiveAvgPool2d(1) self.fc_boxes = nn.Linear(16, 4) # box coords self.fc_scores = nn.Linear(16, 2) # two classes def forward(self, x): x = self.pool(torch.relu(self.conv(x))).squeeze(-1).squeeze(-1) boxes = self.fc_boxes(x) scores = torch.softmax(self.fc_scores(x), dim=1) return boxes, scores # Create dummy image batch (1 image, 3 channels, 64x64) images = torch.randn(1, 3, 64, 64) model = SimpleDetector() boxes, scores = model(images) print(f"Predicted box coordinates: {boxes.detach().numpy()}") print(f"Predicted class scores: {scores.detach().numpy()}")
OutputSuccess
Important Notes
Localization is key to know not just what is in the image, but where it is.
Bounding boxes are usually four numbers: x_min, y_min, x_max, y_max or center_x, center_y, width, height.
Good detection models balance classifying objects and accurately localizing them.
Summary
Object detection localizes objects by predicting bounding boxes.
Localization helps find exact positions of objects in images.
Detection models output both class scores and box coordinates.