Bird
Raised Fist0
Computer Visionml~20 mins

Why CNNs dominate image classification in Computer Vision - Challenge Your Understanding

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
CNN Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why do CNNs perform better than fully connected networks on images?

Which reason best explains why Convolutional Neural Networks (CNNs) usually outperform fully connected networks on image classification tasks?

ACNNs use local connections and shared weights, capturing spatial patterns efficiently.
BFully connected networks have fewer parameters, so they underfit images.
CCNNs ignore pixel relationships, making them faster but less accurate.
DFully connected networks use convolution layers that reduce image size too much.
Attempts:
2 left
💡 Hint

Think about how images have nearby pixels related to each other.

Predict Output
intermediate
2:00remaining
Output shape after convolution layer

Given a grayscale image of size 28x28 and a convolution layer with 6 filters of size 5x5, stride 1, and no padding, what is the output shape?

Computer Vision
import torch
import torch.nn as nn

input_tensor = torch.randn(1, 1, 28, 28)  # batch=1, channels=1, height=28, width=28
conv = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=0)
output = conv(input_tensor)
print(output.shape)
Atorch.Size([6, 1, 28, 28])
Btorch.Size([1, 6, 28, 28])
Ctorch.Size([1, 6, 24, 24])
Dtorch.Size([1, 1, 24, 24])
Attempts:
2 left
💡 Hint

Output size = (Input size - Kernel size) / Stride + 1

Model Choice
advanced
2:00remaining
Choosing CNN architecture for complex image classification

You want to classify high-resolution images with many classes. Which CNN architecture choice is best to improve accuracy while managing training time?

AUse a CNN with residual connections (ResNet) to allow deeper networks without vanishing gradients.
BUse a very deep CNN with many layers and no pooling to keep all details.
CUse a shallow CNN with large fully connected layers at the end to capture complexity.
DUse only convolution layers with large kernel sizes (e.g., 11x11) to cover more area.
Attempts:
2 left
💡 Hint

Think about how very deep networks can be trained effectively.

Hyperparameter
advanced
2:00remaining
Effect of kernel size on CNN feature detection

How does increasing the convolution kernel size from 3x3 to 7x7 generally affect a CNN's ability to detect features in images?

ALarger kernels always improve accuracy without drawbacks.
BLarger kernels capture more global features but increase parameters and risk overfitting.
CLarger kernels decrease the number of parameters and speed up training.
DLarger kernels reduce the receptive field and miss important details.
Attempts:
2 left
💡 Hint

Consider the trade-off between detail and context in feature detection.

Metrics
expert
2:00remaining
Interpreting CNN training metrics for image classification

During CNN training on image classification, you observe training accuracy steadily increasing but validation accuracy plateaus and then decreases. What does this indicate?

AThe dataset is too small to train any model.
BThe model is underfitting and needs more layers.
CThe learning rate is too low, causing slow training.
DThe model is overfitting the training data and not generalizing well.
Attempts:
2 left
💡 Hint

Think about what it means when training improves but validation worsens.

Practice

(1/5)
1. Why are Convolutional Neural Networks (CNNs) especially good for image classification?
easy
A. Because they only work with black and white images
B. Because they use random guessing to classify images
C. Because they ignore image details and focus on text
D. Because they scan small parts of images to find important patterns

Solution

  1. Step 1: Understand CNN scanning method

    CNNs look at small parts of an image called patches to detect patterns like edges or shapes.
  2. Step 2: Connect scanning to image classification

    By scanning patches, CNNs learn important features that help tell one image from another.
  3. Final Answer:

    Because they scan small parts of images to find important patterns -> Option D
  4. Quick Check:

    CNN scanning = small parts pattern detection [OK]
Hint: Remember CNNs focus on small image parts to find patterns [OK]
Common Mistakes:
  • Thinking CNNs guess randomly
  • Believing CNNs ignore image details
  • Assuming CNNs only work on black and white images
2. Which of the following is the correct way to describe the pooling operation in CNNs?
easy
A. Pooling increases the image size to add more details
B. Pooling shrinks the image while keeping important information
C. Pooling removes all colors from the image
D. Pooling randomly changes pixel values

Solution

  1. Step 1: Define pooling in CNNs

    Pooling reduces the size of the image or feature map but keeps the key features intact.
  2. Step 2: Identify correct description

    Pooling does not increase size or remove colors; it shrinks the image while preserving important info.
  3. Final Answer:

    Pooling shrinks the image while keeping important information -> Option B
  4. Quick Check:

    Pooling = shrink + keep key info [OK]
Hint: Pooling shrinks images but keeps what matters [OK]
Common Mistakes:
  • Thinking pooling makes images bigger
  • Believing pooling removes colors
  • Assuming pooling changes pixels randomly
3. Given this simple CNN layer code snippet in Python using PyTorch:
import torch
import torch.nn as nn
conv = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=3)
input_tensor = torch.randn(1, 3, 5, 5)
output = conv(input_tensor)
print(output.shape)

What will be the shape of the output tensor?
medium
A. torch.Size([1, 1, 3, 3])
B. torch.Size([1, 3, 3, 3])
C. torch.Size([1, 1, 5, 5])
D. torch.Size([3, 1, 3, 3])

Solution

  1. Step 1: Understand Conv2d output size formula

    Output size = (Input size - Kernel size + 1) for default stride and padding. Here, input is 5x5, kernel is 3x3, so output is 3x3.
  2. Step 2: Check channels and batch size

    Batch size is 1, output channels is 1, so output shape is (1, 1, 3, 3).
  3. Final Answer:

    torch.Size([1, 1, 3, 3]) -> Option A
  4. Quick Check:

    Output shape = (1, 1, 3, 3) [OK]
Hint: Output size = input - kernel + 1 with default stride [OK]
Common Mistakes:
  • Confusing input and output channels
  • Forgetting batch size dimension
  • Assuming output size equals input size
4. Identify the error in this CNN pooling layer code snippet:
import torch
import torch.nn as nn
pool = nn.MaxPool2d(kernel_size=2, stride=3)
input_tensor = torch.randn(1, 1, 6, 6)
output = pool(input_tensor)
print(output.shape)

What is the problem with this code?
medium
A. Input tensor shape is invalid for pooling
B. Kernel size must be equal to stride in MaxPool2d
C. Stride is larger than kernel size, causing unexpected output size
D. MaxPool2d does not accept stride as a parameter

Solution

  1. Step 1: Check pooling parameters

    Stride can be different from kernel size, but stride larger than kernel size can cause skipping regions and smaller output.
  2. Step 2: Understand effect on output size

    Stride 3 with kernel 2 on 6x6 input reduces output size more than expected, which may cause loss of important info.
  3. Final Answer:

    Stride is larger than kernel size, causing unexpected output size -> Option C
  4. Quick Check:

    Stride > kernel size affects output size [OK]
Hint: Stride bigger than kernel skips image parts, watch output size [OK]
Common Mistakes:
  • Thinking kernel size must equal stride
  • Believing input shape is invalid
  • Assuming MaxPool2d can't take stride
5. You want to build a CNN that classifies images of cats and dogs. Which combination best explains why CNNs dominate this task compared to a simple fully connected network?
hard
A. CNNs scan local image parts and use pooling to reduce size, capturing patterns efficiently
B. Fully connected networks scan images in small parts and pool features
C. CNNs ignore image patterns and rely on random weights
D. Fully connected networks use convolution layers to find edges

Solution

  1. Step 1: Compare CNN and fully connected networks

    CNNs scan small parts of images (local receptive fields) and use pooling to keep important info while reducing size.
  2. Step 2: Understand why CNNs are better for images

    Fully connected networks treat all pixels equally without spatial structure, making them less efficient for images.
  3. Final Answer:

    CNNs scan local image parts and use pooling to reduce size, capturing patterns efficiently -> Option A
  4. Quick Check:

    CNN local scan + pooling > fully connected for images [OK]
Hint: CNNs scan parts + pool; fully connected treats all pixels equally [OK]
Common Mistakes:
  • Confusing fully connected with convolution layers
  • Thinking CNNs ignore image patterns
  • Believing fully connected networks use pooling