Image preprocessing improves image quality and standardizes input data, making it easier for models to learn patterns effectively.
import torch import torch.nn as nn input_tensor = torch.randn(1, 1, 28, 28) # batch_size=1, channels=1, height=28, width=28 conv = nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1) output = conv(input_tensor) output.shape
Padding of 1 pixel on each side with a 3x3 kernel and stride 1 keeps the height and width unchanged. The output channels equal the number of filters (10).
Feature Pyramid Networks (FPN) enhance CNNs by combining features at multiple scales, improving detection of objects of different sizes.
IoU measures how well the predicted segmentation overlaps with the true segmentation, making it ideal for segmentation tasks.
import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 16, 3) self.fc = nn.Linear(16*30*30, 10) def forward(self, x): x = self.conv(x) x = torch.relu(x) x = x.view(-1, 16*30*30) x = self.fc(x) return x model = SimpleCNN() input_tensor = torch.randn(4, 3, 32, 32) output = model(input_tensor)
Without padding, the 3x3 convolution reduces spatial dimensions from 32x32 to 30x30 ((32-3+0)/1 +1 = 30). The view flattens to 16*30*30 = 14400 features, but the Linear layer expects 16*30*30 = 14400, causing a shape mismatch runtime error.