Bird
Raised Fist0
PyTorchml~20 mins

Why CNNs detect spatial patterns in PyTorch - Challenge Your Understanding

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Spatial Pattern Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do CNNs use filters?

Why do convolutional neural networks (CNNs) use filters (also called kernels) when processing images?

AFilters scan the image to detect local patterns like edges or textures by focusing on small regions.
BFilters randomly change pixel values to increase image diversity.
CFilters convert images into text descriptions for easier processing.
DFilters remove all colors to simplify the image into black and white.
Attempts:
2 left
💡 Hint

Think about how looking at small parts of a picture helps you recognize shapes.

Predict Output
intermediate
2:00remaining
Output shape after convolution

Given a grayscale image tensor of shape (1, 1, 28, 28) and a convolution layer with 6 filters of size 3x3, stride 1, and padding 0, what is the output shape after applying the convolution?

PyTorch
import torch
import torch.nn as nn

image = torch.randn(1, 1, 28, 28)  # batch=1, channels=1, height=28, width=28
conv = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=1, padding=0)
output = conv(image)
print(output.shape)
Atorch.Size([1, 6, 26, 26])
Btorch.Size([1, 6, 28, 28])
Ctorch.Size([1, 1, 26, 26])
Dtorch.Size([6, 1, 28, 28])
Attempts:
2 left
💡 Hint

Output size = (Input size - Kernel size + 2 * Padding) / Stride + 1

Model Choice
advanced
2:00remaining
Choosing CNN for spatial pattern detection

You want to build a model to recognize handwritten digits from images. Which model type is best suited to detect spatial patterns like edges and curves?

AA linear regression model that predicts digits directly from pixel values.
BA simple feedforward neural network with fully connected layers only.
CA convolutional neural network (CNN) because it captures local spatial features using filters.
DA recurrent neural network (RNN) designed for sequential data like text.
Attempts:
2 left
💡 Hint

Think about which model type is designed to understand images by looking at small parts.

Hyperparameter
advanced
2:00remaining
Effect of kernel size on spatial pattern detection

How does increasing the kernel size in a CNN layer affect the spatial patterns the model can detect?

ALarger kernels always improve model accuracy without any drawbacks.
BLarger kernels capture bigger spatial patterns but reduce the output size more.
CLarger kernels ignore spatial patterns and treat the image as a flat vector.
DLarger kernels decrease the number of filters in the layer automatically.
Attempts:
2 left
💡 Hint

Think about how a bigger window sees more of the image at once.

Metrics
expert
2:00remaining
Interpreting CNN training loss and accuracy

During CNN training on image data, you observe the training loss steadily decreases but the validation accuracy stops improving and fluctuates. What does this indicate?

AThe training data is corrupted and causing unstable validation results.
BThe model is underfitting and needs more training epochs.
CThe model has perfect generalization and no further tuning is needed.
DThe model is overfitting the training data and not generalizing well to new data.
Attempts:
2 left
💡 Hint

Think about what it means when training improves but validation does not.

Practice

(1/5)
1. Why do CNNs use small filters that slide over an image?
easy
A. To detect local spatial patterns like edges and textures
B. To reduce the image size drastically in one step
C. To convert images into text data
D. To randomly change pixel colors

Solution

  1. Step 1: Understand the role of filters in CNNs

    Filters slide over small parts of the image to focus on local details like edges or shapes.
  2. Step 2: Connect filter behavior to spatial pattern detection

    By scanning the image locally, filters learn to recognize important spatial features that help in tasks like image recognition.
  3. Final Answer:

    To detect local spatial patterns like edges and textures -> Option A
  4. Quick Check:

    Filters detect local patterns = A [OK]
Hint: Filters scan small areas to find edges and shapes [OK]
Common Mistakes:
  • Thinking filters change image size drastically in one step
  • Believing CNNs convert images to text directly
  • Assuming filters randomly alter pixel colors
2. Which PyTorch code correctly creates a 2D convolutional layer with a 3x3 filter?
easy
A. torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3)
B. torch.nn.Conv1d(in_channels=1, out_channels=10, kernel_size=3)
C. torch.nn.Linear(in_features=3, out_features=10)
D. torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=5)

Solution

  1. Step 1: Identify the correct convolution layer type

    For images, 2D convolution (Conv2d) is used, not Conv1d or Linear layers.
  2. Step 2: Check the kernel size matches 3x3

    kernel_size=3 means a 3x3 filter, so torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3) is correct; torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=5) uses 5x5.
  3. Final Answer:

    torch.nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3) -> Option A
  4. Quick Check:

    Conv2d with kernel_size=3 = D [OK]
Hint: Use Conv2d and kernel_size=3 for 3x3 filters [OK]
Common Mistakes:
  • Using Conv1d instead of Conv2d for images
  • Confusing Linear layers with convolution layers
  • Setting wrong kernel size for the filter
3. Given this PyTorch code snippet, what is the output shape after the convolution?
import torch
conv = torch.nn.Conv2d(1, 1, kernel_size=3)
input = torch.randn(1, 1, 5, 5)
output = conv(input)
print(output.shape)
medium
A. torch.Size([1, 1, 5, 5])
B. torch.Size([1, 3, 3, 3])
C. torch.Size([1, 1, 7, 7])
D. torch.Size([1, 1, 3, 3])

Solution

  1. Step 1: Understand convolution output size formula

    Output size = Input size - Kernel size + 1 (assuming stride=1, padding=0). Here, 5 - 3 + 1 = 3.
  2. Step 2: Apply formula to each spatial dimension

    Both height and width become 3, so output shape is (1 batch, 1 channel, 3 height, 3 width).
  3. Final Answer:

    torch.Size([1, 1, 3, 3]) -> Option D
  4. Quick Check:

    Output size = 5-3+1 = 3 [OK]
Hint: Output size = input - kernel + 1 if no padding [OK]
Common Mistakes:
  • Assuming output size equals input size without padding
  • Confusing batch and channel dimensions
  • Misapplying kernel size in output calculation
4. What is wrong with this PyTorch code for a convolutional layer?
conv = torch.nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3)
input = torch.randn(1, 1, 28, 28)
output = conv(input)
print(output.shape)
medium
A. Output channels must be less than input channels
B. Kernel size is too large for the input
C. Input channels do not match the layer's in_channels
D. Batch size must be greater than 1

Solution

  1. Step 1: Check input and layer channel compatibility

    The layer expects 3 input channels, but input has only 1 channel, causing a mismatch error.
  2. Step 2: Confirm other parameters are valid

    Kernel size 3 is valid for 28x28 input, output channels can be any positive number, batch size 1 is allowed.
  3. Final Answer:

    Input channels do not match the layer's in_channels -> Option C
  4. Quick Check:

    Input channels mismatch = A [OK]
Hint: Input channels must match Conv2d in_channels [OK]
Common Mistakes:
  • Ignoring channel mismatch errors
  • Thinking kernel size is invalid for input
  • Believing batch size must be >1
5. How does using multiple convolutional layers help CNNs detect complex spatial patterns?
hard
A. Layers randomly shuffle pixels to create new patterns
B. Each layer learns higher-level features by combining simpler patterns from previous layers
C. Multiple layers reduce the image size to zero quickly
D. Each layer independently detects the same simple edges

Solution

  1. Step 1: Understand feature hierarchy in CNNs

    Early layers detect simple features like edges; later layers combine these to form complex shapes and objects.
  2. Step 2: Explain how multiple layers build complexity

    Stacking layers lets the network learn spatial patterns at increasing levels of abstraction, improving recognition.
  3. Final Answer:

    Each layer learns higher-level features by combining simpler patterns from previous layers -> Option B
  4. Quick Check:

    Layer stacking builds complex features = C [OK]
Hint: Layers build complexity by combining simpler features [OK]
Common Mistakes:
  • Thinking layers just reduce image size quickly
  • Believing layers shuffle pixels randomly
  • Assuming all layers detect the same simple edges