Bird
Raised Fist0
Computer Visionml~20 mins

Why architecture design impacts performance in Computer Vision - Challenge Your Understanding

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Architecture Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
How does increasing model depth affect performance?

In convolutional neural networks, what is the most common effect of increasing the number of layers (depth) on model performance?

AIt reduces performance because deeper models cannot learn complex features.
BIt always improves performance by capturing more complex features without any drawbacks.
CIt can improve performance by learning complex features but may cause vanishing gradients and overfitting.
DIt has no effect on performance since only the number of neurons matters.
Attempts:
2 left
💡 Hint

Think about what happens when a model becomes very deep and how training might be affected.

Predict Output
intermediate
2:00remaining
Output shape after convolution and pooling layers

Given the following PyTorch model snippet, what is the output shape after the last layer?

Computer Vision
import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        return x

model = SimpleCNN()
input_tensor = torch.randn(1, 3, 64, 64)
output = model(input_tensor)
output.shape
Atorch.Size([1, 32, 8, 8])
Btorch.Size([1, 16, 8, 8])
Ctorch.Size([1, 16, 16, 16])
Dtorch.Size([1, 32, 16, 16])
Attempts:
2 left
💡 Hint

Calculate the size after each convolution and pooling step.

Hyperparameter
advanced
2:00remaining
Choosing kernel size impact on feature extraction

How does increasing the convolution kernel size from 3x3 to 7x7 typically affect a CNN's ability to extract features?

ALarger kernels capture more global features but increase parameters and risk overfitting.
BLarger kernels always improve performance without any drawbacks.
CLarger kernels reduce the receptive field and limit feature extraction.
DKernel size does not affect feature extraction, only the number of filters matters.
Attempts:
2 left
💡 Hint

Think about how kernel size relates to the area of the image the filter sees.

Metrics
advanced
2:00remaining
Interpreting validation accuracy drop with deeper architecture

A CNN model with 10 layers achieves 85% validation accuracy. Increasing to 50 layers drops validation accuracy to 70%. What is the most likely reason?

AThe dataset is too large for the deeper model to learn.
BThe deeper model suffers from vanishing gradients and overfitting.
CThe deeper model is underfitting due to too few parameters.
DValidation accuracy always decreases with more layers.
Attempts:
2 left
💡 Hint

Consider training difficulties with very deep networks.

🔧 Debug
expert
3:00remaining
Identifying cause of exploding gradients in deep CNN

Given a deep CNN training with exploding gradients, which architectural choice is the most likely cause?

AUsing ReLU activation without batch normalization in a very deep network.
BUsing batch normalization and residual connections in the network.
CUsing small kernel sizes like 3x3 in all convolution layers.
DUsing dropout layers after every convolution.
Attempts:
2 left
💡 Hint

Think about what helps stabilize training in deep networks.

Practice

(1/5)
1. Why does the design of a neural network architecture affect its performance on image tasks?
easy
A. Because it controls the size of the training dataset
B. Because it determines how well the model can learn important features from images
C. Because it decides the file format of the images
D. Because it changes the color of the images

Solution

  1. Step 1: Understand the role of architecture in feature learning

    The architecture defines layers and connections that extract patterns from images.
  2. Step 2: Connect architecture to model performance

    Better feature extraction leads to improved accuracy and generalization on tasks.
  3. Final Answer:

    Because it determines how well the model can learn important features from images -> Option B
  4. Quick Check:

    Architecture affects feature learning = D [OK]
Hint: Think about how model structure helps find image patterns [OK]
Common Mistakes:
  • Confusing architecture with image properties
  • Thinking architecture changes data format
  • Believing architecture controls dataset size
2. Which of the following is the correct way to define a convolutional layer in a deep learning model using Python and PyTorch?
easy
A. nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
B. nn.Linear(in_features=3, out_features=16)
C. nn.Conv1d(in_channels=3, out_channels=16, kernel_size=3)
D. nn.MaxPool2d(kernel_size=2, stride=2)

Solution

  1. Step 1: Identify the convolutional layer syntax

    In PyTorch, Conv2d is used for 2D image convolutions with parameters for channels and kernel size.
  2. Step 2: Check each option's layer type

    nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1) correctly uses nn.Conv2d with proper parameters; others define different layers.
  3. Final Answer:

    nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1) -> Option A
  4. Quick Check:

    Correct Conv2d syntax = B [OK]
Hint: Look for Conv2d with correct parameters for image layers [OK]
Common Mistakes:
  • Confusing Conv2d with Linear or Conv1d layers
  • Missing stride or padding parameters
  • Choosing pooling layers instead of convolution
3. Consider this simplified CNN architecture for image classification:
model = nn.Sequential(
  nn.Conv2d(3, 8, 3, padding=1),
  nn.ReLU(),
  nn.MaxPool2d(2),
  nn.Conv2d(8, 16, 3, padding=1),
  nn.ReLU(),
  nn.MaxPool2d(2),
  nn.Flatten(),
  nn.Linear(16*8*8, 10)
)

If the input images are 32x32 pixels, what is the size of the feature map before flattening?
medium
A. 8 channels with 8x8 spatial size
B. 8 channels with 16x16 spatial size
C. 16 channels with 16x16 spatial size
D. 16 channels with 8x8 spatial size

Solution

  1. Step 1: Calculate size after first Conv2d and MaxPool2d

    Input 32x32, Conv2d with padding=1 keeps size 32x32, MaxPool2d(2) halves to 16x16 with 8 channels.
  2. Step 2: Calculate size after second Conv2d and MaxPool2d

    Conv2d keeps size 16x16 with 16 channels, MaxPool2d halves to 8x8 with 16 channels.
  3. Final Answer:

    16 channels with 8x8 spatial size -> Option D
  4. Quick Check:

    Pooling halves size twice = 8x8 with 16 channels [OK]
Hint: Each MaxPool2d(2) halves spatial size [OK]
Common Mistakes:
  • Forgetting padding keeps size after convolution
  • Not halving size after pooling
  • Mixing channel counts with spatial dimensions
4. You have a CNN model that overfits training data but performs poorly on new images. Which architecture change can help reduce overfitting?
medium
A. Remove all pooling layers to keep more details
B. Increase the number of convolutional filters drastically
C. Add dropout layers to randomly ignore some neurons during training
D. Use a smaller batch size during training

Solution

  1. Step 1: Understand overfitting and regularization

    Overfitting means the model memorizes training data; dropout helps by randomly ignoring neurons to generalize better.
  2. Step 2: Evaluate options for reducing overfitting

    Adding dropout (A) is a common fix; increasing filters (B) may worsen overfitting; removing pooling (C) increases parameters; batch size (D) affects training stability but less impact on overfitting.
  3. Final Answer:

    Add dropout layers to randomly ignore some neurons during training -> Option C
  4. Quick Check:

    Dropout reduces overfitting = A [OK]
Hint: Use dropout to prevent memorizing training data [OK]
Common Mistakes:
  • Thinking bigger models always reduce overfitting
  • Removing pooling increases parameters and overfitting
  • Confusing batch size effects with architecture changes
5. You want to design a model for real-time object detection on a mobile device. Which architectural choice best balances accuracy and speed?
hard
A. Use a lightweight architecture like MobileNet with depthwise separable convolutions
B. Use a very deep ResNet with 152 layers for highest accuracy
C. Use a fully connected network without convolutions
D. Use a large kernel size (e.g., 11x11) in all convolution layers

Solution

  1. Step 1: Identify requirements for mobile real-time detection

    Mobile devices need fast, efficient models with good accuracy and low computation.
  2. Step 2: Evaluate architectural options

    MobileNet uses depthwise separable convolutions to reduce computation while keeping accuracy; very deep ResNet is slow; fully connected networks lack spatial understanding; large kernels increase computation.
  3. Final Answer:

    Use a lightweight architecture like MobileNet with depthwise separable convolutions -> Option A
  4. Quick Check:

    MobileNet balances speed and accuracy = C [OK]
Hint: Choose lightweight models designed for mobile use [OK]
Common Mistakes:
  • Picking very deep models ignoring speed constraints
  • Using fully connected layers for images
  • Choosing large kernels that slow down inference