PytorchHow-ToBeginner · 3 min read

How to Use nn.Conv2d in PyTorch: Syntax and Example

In PyTorch, nn.Conv2d creates a 2D convolutional layer that processes image-like data. You define it by specifying input channels, output channels, kernel size, and optional parameters like stride and padding. Then, pass your input tensor through this layer to get the convolved output.

📐

Syntax

The nn.Conv2d layer is initialized with key parameters:

in_channels: Number of input channels (e.g., 3 for RGB images).
out_channels: Number of filters to apply, which determines output channels.
kernel_size: Size of the filter window (e.g., 3 for a 3x3 filter).
stride: Step size for moving the filter (default is 1).
padding: Number of pixels added around input edges (default is 0).

These control how the convolution scans the input and produces output.

python

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

💻

Example

This example shows how to create a Conv2d layer and apply it to a random input tensor simulating a batch of 1 RGB image of size 5x5 pixels.

python

import torch
import torch.nn as nn

# Create a Conv2d layer: 3 input channels, 2 output channels, 3x3 kernel
conv_layer = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3, stride=1, padding=1)

# Create a random input tensor: batch size 1, 3 channels, 5x5 image
input_tensor = torch.randn(1, 3, 5, 5)

# Apply convolution
output_tensor = conv_layer(input_tensor)

print('Input shape:', input_tensor.shape)
print('Output shape:', output_tensor.shape)

Output

Input shape: torch.Size([1, 3, 5, 5]) Output shape: torch.Size([1, 2, 5, 5])

⚠️

Common Pitfalls

Wrong input shape: Conv2d expects input as (batch_size, channels, height, width). Passing (batch_size, height, width, channels) will cause errors.
Ignoring padding: Without padding, output size shrinks. Use padding to keep output size same as input.
Kernel size too large: Kernel size larger than input spatial dimensions causes errors.
Forgetting batch dimension: Input must include batch size, even if 1.

python

import torch
import torch.nn as nn

conv = nn.Conv2d(3, 1, 3)

# Wrong input shape (channels last) - will error
try:
    wrong_input = torch.randn(1, 5, 5, 3)
    conv(wrong_input)
except Exception as e:
    print('Error with wrong input shape:', e)

# Correct input shape
correct_input = torch.randn(1, 3, 5, 5)
output = conv(correct_input)
print('Output shape with correct input:', output.shape)

Output

Error with wrong input shape: Expected 4-dimensional input for 4-dimensional weight [1, 3, 3, 3], but got 4-dimensional input of size [1, 5, 5, 3] instead Output shape with correct input: torch.Size([1, 1, 3, 3])

📊

Quick Reference

Parameter	Description	Default
in_channels	Number of input channels	Required
out_channels	Number of output channels (filters)	Required
kernel_size	Size of convolution kernel (int or tuple)	Required
stride	Stride of convolution	1
padding	Zero-padding added to both sides	0
dilation	Spacing between kernel elements	1
groups	Number of blocked connections	1
bias	If True, adds a learnable bias	True
padding_mode	Type of padding: 'zeros', 'reflect', etc.	'zeros'

✅

Key Takeaways

Use nn.Conv2d to create 2D convolution layers for image data in PyTorch.

Input tensor must have shape (batch_size, channels, height, width).

Adjust kernel_size, stride, and padding to control output size and features.

Common errors come from wrong input shape or forgetting batch dimension.

Padding keeps output size same as input; without it, output shrinks.