Overview - Non-maximum suppression

What is it?

Non-maximum suppression (NMS) is a technique used to select the best bounding boxes from many overlapping boxes in object detection. It keeps the box with the highest confidence score and removes others that overlap too much with it. This helps reduce duplicate detections of the same object. NMS is essential for making object detection results clear and accurate.

Why it matters

Without NMS, object detection models would output many overlapping boxes for the same object, making it hard to understand what the model actually detected. This would confuse users and reduce the usefulness of detection systems in real-world tasks like self-driving cars or face recognition. NMS cleans up these results so the system can confidently say where objects are.

Where it fits

Before learning NMS, you should understand how object detection models predict bounding boxes and confidence scores. After NMS, learners often study more advanced post-processing techniques like soft-NMS or learn how to integrate NMS efficiently in model pipelines.

Mental Model

Core Idea

Non-maximum suppression picks the strongest detection and removes nearby weaker ones to avoid duplicates.

Think of it like...

Imagine you are picking the tallest person in a crowded room and asking everyone too close to them to step aside, so you only see one tall person clearly.

Detections: [■■■■■(score 0.9), ■■■■(0.8), ■■■(0.7)]
Overlap check → Keep highest score box ■■■■■
Remove boxes overlapping too much with ■■■■■
Result: Only ■■■■■ remains

Build-Up - 7 Steps

1

FoundationWhat are bounding boxes and scores

Concept: Understanding the basic outputs of object detection models: boxes and confidence scores.

Object detection models predict rectangles (bounding boxes) around objects and assign a confidence score to each box. The score shows how sure the model is that the box contains an object.

Result

You get many boxes with scores, some overlapping the same object.

Knowing what bounding boxes and scores represent is essential before learning how to filter them.

2

FoundationWhy overlapping boxes cause problems

3

IntermediateHow non-maximum suppression works

4

IntermediateIntersection over Union (IoU) explained

5

IntermediateImplementing NMS in PyTorch

6

AdvancedChoosing the IoU threshold wisely

7

ExpertLimitations and alternatives to standard NMS

Under the Hood

NMS works by sorting detection boxes by confidence scores, then iteratively selecting the highest scoring box and removing all boxes with IoU above a threshold with it. This process repeats until no boxes remain. Internally, this involves tensor operations for sorting, IoU calculation, and masking to filter boxes efficiently.

Why designed this way?

NMS was designed to solve the problem of multiple overlapping detections in a simple, fast way. Alternatives like clustering or learned suppression were more complex or slower. NMS balances speed and effectiveness, making it suitable for real-time systems.

┌───────────────┐
│ Input Boxes   │
│ + Scores     │
└──────┬────────┘
       │ Sort by score
       ▼
┌───────────────┐
│ Pick highest  │
│ scoring box   │
└──────┬────────┘
       │ Calculate IoU
       ▼
┌───────────────┐
│ Remove boxes  │
│ with IoU > T │
└──────┬────────┘
       │ Repeat until no boxes
       ▼
┌───────────────┐
│ Output boxes  │
│ after NMS     │
└───────────────┘

Myth Busters - 3 Common Misconceptions

Quick: Does NMS always keep the box with the largest area? Commit to yes or no before reading on.

Common Belief:NMS keeps the biggest box because it covers the object best.

Tap to reveal reality

Quick: Does NMS remove all overlapping boxes regardless of their scores? Commit to yes or no before reading on.

Common Belief:NMS removes every box that overlaps with any other box.

Tap to reveal reality

Quick: Can NMS perfectly separate objects that are very close together? Commit to yes or no before reading on.

Common Belief:NMS can always distinguish closely packed objects perfectly.

Tap to reveal reality

Expert Zone

1

NMS performance depends heavily on the IoU threshold, which often requires task-specific tuning.

2

The order of boxes after sorting affects which boxes are kept, so score calibration impacts results.

3

Batch processing NMS efficiently requires careful tensor operations to avoid slow loops.

When NOT to use

Standard NMS is not ideal when objects are densely packed or heavily overlapping. Alternatives like soft-NMS, which reduces scores instead of removing boxes, or learned NMS methods that use neural networks to decide suppression, are better choices.

Production Patterns

In production, NMS is often integrated as a final step in detection pipelines using optimized libraries like torchvision.ops.nms. Systems tune IoU thresholds per class and may combine NMS with confidence thresholding and class-wise filtering for best results.

Connections

Clustering algorithms

Both group similar items and reduce redundancy.

Understanding clustering helps grasp how NMS groups overlapping boxes and selects representatives.

Signal processing peak detection

NMS is similar to picking peaks in noisy signals by suppressing nearby lower peaks.

Knowing peak detection clarifies why NMS picks the strongest box and suppresses neighbors.

Human visual attention

NMS mimics how humans focus on the most prominent object and ignore close distractions.

This connection shows how AI mimics natural filtering to simplify complex scenes.

Common Pitfalls

#1Using a very low IoU threshold causing missed detections.

Wrong approach:indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.1)

Correct approach:indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.5)

Root cause:Misunderstanding that too low a threshold removes boxes that are actually distinct objects.

#2Applying NMS before sorting boxes by score.

Wrong approach:indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.5) # boxes unsorted

Correct approach:scores, order = scores.sort(descending=True) boxes = boxes[order] indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.5)

Root cause:Not sorting boxes causes NMS to keep wrong boxes because it assumes sorted input.

#3Ignoring class labels and applying NMS across all classes together.

Wrong approach:indices = torchvision.ops.nms(boxes, scores, iou_threshold=0.5) # mixed classes

Correct approach:for cls in unique_classes: cls_mask = labels == cls cls_indices = torchvision.ops.nms(boxes[cls_mask], scores[cls_mask], iou_threshold=0.5) # combine cls_indices

Root cause:Applying NMS across classes removes valid detections from different object types.

Key Takeaways

Non-maximum suppression cleans up overlapping detection boxes by keeping the highest scoring ones and removing others that overlap too much.

IoU is the key measure to decide how much overlap is too much, and tuning its threshold affects detection quality.

PyTorch provides a built-in efficient NMS function that should be used instead of custom implementations.

Standard NMS struggles with crowded scenes, so alternatives like soft-NMS or learned NMS can improve results.

Applying NMS correctly requires sorting boxes by score and handling classes separately to avoid removing valid detections.