0
0
PytorchHow-ToBeginner · 3 min read

How to Use nn.Conv2d in PyTorch: Syntax and Example

In PyTorch, nn.Conv2d creates a 2D convolutional layer that processes image-like data. You define it by specifying input channels, output channels, kernel size, and optional parameters like stride and padding. Then, pass your input tensor through this layer to get the convolved output.
📐

Syntax

The nn.Conv2d layer is initialized with key parameters:

  • in_channels: Number of input channels (e.g., 3 for RGB images).
  • out_channels: Number of filters to apply, which determines output channels.
  • kernel_size: Size of the filter window (e.g., 3 for a 3x3 filter).
  • stride: Step size for moving the filter (default is 1).
  • padding: Number of pixels added around input edges (default is 0).

These control how the convolution scans the input and produces output.

python
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
💻

Example

This example shows how to create a Conv2d layer and apply it to a random input tensor simulating a batch of 1 RGB image of size 5x5 pixels.

python
import torch
import torch.nn as nn

# Create a Conv2d layer: 3 input channels, 2 output channels, 3x3 kernel
conv_layer = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3, stride=1, padding=1)

# Create a random input tensor: batch size 1, 3 channels, 5x5 image
input_tensor = torch.randn(1, 3, 5, 5)

# Apply convolution
output_tensor = conv_layer(input_tensor)

print('Input shape:', input_tensor.shape)
print('Output shape:', output_tensor.shape)
Output
Input shape: torch.Size([1, 3, 5, 5]) Output shape: torch.Size([1, 2, 5, 5])
⚠️

Common Pitfalls

  • Wrong input shape: Conv2d expects input as (batch_size, channels, height, width). Passing (batch_size, height, width, channels) will cause errors.
  • Ignoring padding: Without padding, output size shrinks. Use padding to keep output size same as input.
  • Kernel size too large: Kernel size larger than input spatial dimensions causes errors.
  • Forgetting batch dimension: Input must include batch size, even if 1.
python
import torch
import torch.nn as nn

conv = nn.Conv2d(3, 1, 3)

# Wrong input shape (channels last) - will error
try:
    wrong_input = torch.randn(1, 5, 5, 3)
    conv(wrong_input)
except Exception as e:
    print('Error with wrong input shape:', e)

# Correct input shape
correct_input = torch.randn(1, 3, 5, 5)
output = conv(correct_input)
print('Output shape with correct input:', output.shape)
Output
Error with wrong input shape: Expected 4-dimensional input for 4-dimensional weight [1, 3, 3, 3], but got 4-dimensional input of size [1, 5, 5, 3] instead Output shape with correct input: torch.Size([1, 1, 3, 3])
📊

Quick Reference

ParameterDescriptionDefault
in_channelsNumber of input channelsRequired
out_channelsNumber of output channels (filters)Required
kernel_sizeSize of convolution kernel (int or tuple)Required
strideStride of convolution1
paddingZero-padding added to both sides0
dilationSpacing between kernel elements1
groupsNumber of blocked connections1
biasIf True, adds a learnable biasTrue
padding_modeType of padding: 'zeros', 'reflect', etc.'zeros'

Key Takeaways

Use nn.Conv2d to create 2D convolution layers for image data in PyTorch.
Input tensor must have shape (batch_size, channels, height, width).
Adjust kernel_size, stride, and padding to control output size and features.
Common errors come from wrong input shape or forgetting batch dimension.
Padding keeps output size same as input; without it, output shrinks.