0
0
PyTorchml~5 mins

Why detection localizes objects in PyTorch

Choose your learning style9 modes available
Introduction

Object detection finds where things are in an image. Localization means drawing boxes around those things so we know their exact place.

When you want to find and locate cars in a street photo.
When you need to detect and mark faces in a group picture.
When robots must identify and pick objects from a shelf.
When tracking animals in wildlife videos to know their positions.
Syntax
PyTorch
import torch.nn as nn

class ObjectDetector(nn.Module):
    def __init__(self):
        super().__init__()
        # model layers here

    def forward(self, x):
        # returns bounding boxes and class scores
        return boxes, scores

The model outputs both boxes (coordinates) and class scores.

Localization means predicting the box coordinates around objects.

Examples
The model returns boxes and scores for each image in the batch.
PyTorch
boxes, scores = model(images)
# boxes shape: [batch_size, num_boxes, 4]
# scores shape: [batch_size, num_boxes, num_classes]
Filter boxes with confidence above 0.5 and print their coordinates and scores.
PyTorch
for box, score in zip(boxes[0], scores[0]):
    if score.max() > 0.5:
        print(f"Box: {box}, Score: {score}")
Sample Model

This simple model predicts one box and class scores for one image. It shows how detection localizes objects by outputting box coordinates.

PyTorch
import torch
import torch.nn as nn

class SimpleDetector(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, 3, padding=1)
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc_boxes = nn.Linear(16, 4)  # box coords
        self.fc_scores = nn.Linear(16, 2)  # two classes

    def forward(self, x):
        x = self.pool(torch.relu(self.conv(x))).squeeze(-1).squeeze(-1)
        boxes = self.fc_boxes(x)
        scores = torch.softmax(self.fc_scores(x), dim=1)
        return boxes, scores

# Create dummy image batch (1 image, 3 channels, 64x64)
images = torch.randn(1, 3, 64, 64)

model = SimpleDetector()
boxes, scores = model(images)

print(f"Predicted box coordinates: {boxes.detach().numpy()}")
print(f"Predicted class scores: {scores.detach().numpy()}")
OutputSuccess
Important Notes

Localization is key to know not just what is in the image, but where it is.

Bounding boxes are usually four numbers: x_min, y_min, x_max, y_max or center_x, center_y, width, height.

Good detection models balance classifying objects and accurately localizing them.

Summary

Object detection localizes objects by predicting bounding boxes.

Localization helps find exact positions of objects in images.

Detection models output both class scores and box coordinates.