How to Use nn.MaxPool2d in PyTorch: Syntax and Example
Use
nn.MaxPool2d in PyTorch to apply max pooling on 2D inputs like images. Initialize it with parameters like kernel_size to define the pooling window, then apply it to your tensor to reduce spatial size by taking the maximum value in each window.Syntax
The nn.MaxPool2d layer is created by specifying the kernel_size, which is the size of the window to take the maximum over. Optional parameters include stride (step size of the window), padding (zero-padding added to the input), and dilation (spacing between kernel elements).
Typical usage:
kernel_size: int or tuple, size of the pooling windowstride: int or tuple, how far the window moves each step (defaults tokernel_size)padding: int or tuple, adds zeros around input edgesdilation: int or tuple, spacing between kernel elements (usually 1)
python
import torch.nn as nn maxpool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1)
Example
This example shows how to create a max pooling layer with a 2x2 window and stride 2, then apply it to a random 4x4 input tensor with one channel and batch size 1. The output tensor has reduced spatial size because max pooling picks the largest value in each 2x2 block.
python
import torch import torch.nn as nn # Create a tensor with shape (batch_size=1, channels=1, height=4, width=4) input_tensor = torch.tensor([[[[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]]]]) # Define max pooling layer maxpool = nn.MaxPool2d(kernel_size=2, stride=2) # Apply max pooling output = maxpool(input_tensor) print("Input Tensor:\n", input_tensor) print("\nOutput Tensor after MaxPool2d:\n", output)
Output
Input Tensor:
tensor([[[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]]]])
Output Tensor after MaxPool2d:
tensor([[[[ 6., 8.],
[14., 16.]]]])
Common Pitfalls
Common mistakes when using nn.MaxPool2d include:
- Not setting
stride, which defaults tokernel_size. This can cause unexpected output sizes if you want overlapping windows. - Using padding incorrectly, which can add zeros and affect max values.
- Feeding input tensors with wrong shape. The input must be 4D: (batch_size, channels, height, width).
Example of a wrong input shape and the fix:
python
# Wrong: input is 3D (missing batch dimension) import torch import torch.nn as nn input_wrong = torch.randn(1, 4, 4) # shape (channels, height, width) maxpool = nn.MaxPool2d(2) try: output_wrong = maxpool(input_wrong) except Exception as e: print(f"Error: {e}") # Right: add batch dimension input_right = input_wrong.unsqueeze(0) # shape (1, channels, height, width) output_right = maxpool(input_right) print("Output shape with correct input:", output_right.shape)
Output
Error: Expected 4-dimensional input for 4-dimensional weight [1, 1, 2, 2], but got 3-dimensional input of size [1, 4, 4] instead
Output shape with correct input: torch.Size([1, 1, 2, 2])
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| kernel_size | Size of the window to take max over | Required |
| stride | Step size of the window movement | kernel_size |
| padding | Zero-padding added to input edges | 0 |
| dilation | Spacing between kernel elements | 1 |
| return_indices | If True, returns max indices for unpooling | False |
| ceil_mode | If True, use ceil instead of floor to compute output shape | False |
Key Takeaways
nn.MaxPool2d reduces spatial size by taking max values in sliding windows over 2D inputs.
Set kernel_size to define the pooling window and stride to control step size; stride defaults to kernel_size.
Input to MaxPool2d must be 4D: (batch_size, channels, height, width).
Padding affects output size and values; use carefully to avoid unexpected results.
Use return_indices=True if you plan to reverse pooling with MaxUnpool2d.