Bird
Raised Fist0
Computer Visionml~20 mins

CNN architecture review in Computer Vision - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
CNN Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Convolutional Layer Output Size

Given an input image of size 64x64 with 3 color channels, a convolutional layer uses 16 filters of size 3x3, stride 1, and padding 1. What will be the output shape of this convolutional layer?

A(62, 62, 16)
B(66, 66, 16)
C(64, 64, 16)
D(64, 64, 3)
Attempts:
2 left
💡 Hint

Remember that padding of 1 keeps the spatial dimensions the same when stride is 1.

Predict Output
intermediate
2:00remaining
Output Shape After Max Pooling

What is the output shape after applying a 2x2 max pooling layer with stride 2 on an input tensor of shape (32, 32, 10)?

A(16, 16, 10)
B(15, 15, 10)
C(31, 31, 10)
D(32, 32, 10)
Attempts:
2 left
💡 Hint

Max pooling reduces spatial dimensions by the stride when filter size equals stride.

Hyperparameter
advanced
2:00remaining
Choosing Kernel Size for Edge Detection

You want to design a CNN layer to detect edges in images. Which kernel size is most appropriate for this task?

A3x3 kernel
B1x1 kernel
C7x7 kernel
D15x15 kernel
Attempts:
2 left
💡 Hint

Edge detection usually requires small kernels to capture local gradients.

Metrics
advanced
2:00remaining
Interpreting CNN Training Accuracy

A CNN model training on image classification shows training accuracy of 98% but validation accuracy of 75%. What is the most likely explanation?

AThe model is underfitting the training data
BThe model is overfitting the training data
CThe validation data is corrupted
DThe training data is too small
Attempts:
2 left
💡 Hint

High training accuracy but low validation accuracy usually means the model memorizes training data.

🔧 Debug
expert
3:00remaining
Identifying Error in CNN Layer Definition

Consider this PyTorch CNN layer definition:

nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=2, padding=2)

What is the output size for an input image of size 64x64? Is there an error in the output size calculation?

AOutput size is 64x64; stride 2 has no effect
BOutput size is 32x32; padding is correct for stride 2
COutput size is 31x31; padding is too small causing output to shrink
DOutput size is 33x33; padding is too large causing output to be bigger than expected
Attempts:
2 left
💡 Hint

Use the formula: ((Input - Kernel + 2*Padding) / Stride) + 1

Practice

(1/5)
1. What is the main purpose of a Convolutional Neural Network (CNN) in computer vision?
easy
A. To perform text translation
B. To sort numbers in a list
C. To generate random images
D. To detect patterns and features in images

Solution

  1. Step 1: Understand CNN function

    CNNs scan images to find important patterns like edges and shapes.
  2. Step 2: Match purpose to options

    Only To detect patterns and features in images describes detecting patterns in images, which is CNN's main job.
  3. Final Answer:

    To detect patterns and features in images -> Option D
  4. Quick Check:

    CNN purpose = detect image patterns [OK]
Hint: CNNs find image features, not unrelated tasks like sorting [OK]
Common Mistakes:
  • Confusing CNNs with general neural networks
  • Thinking CNNs generate images
  • Mixing CNNs with text processing models
2. Which of the following is the correct way to add a 2D convolutional layer in Keras?
easy
A. Dense(units=32, activation='relu')
B. Conv1D(filters=32, kernel_size=3, activation='relu')
C. Conv2D(filters=32, kernel_size=(3,3), activation='relu')
D. MaxPooling2D(pool_size=(2,2))

Solution

  1. Step 1: Identify Conv2D syntax

    Conv2D requires filters, kernel_size as a tuple, and activation function.
  2. Step 2: Compare options

    Conv2D(filters=32, kernel_size=(3,3), activation='relu') matches Conv2D syntax correctly; others are different layers or wrong dimensions.
  3. Final Answer:

    Conv2D(filters=32, kernel_size=(3,3), activation='relu') -> Option C
  4. Quick Check:

    Conv2D syntax = Conv2D(filters=32, kernel_size=(3,3), activation='relu') [OK]
Hint: Conv2D uses 2D kernel size tuple, not single int [OK]
Common Mistakes:
  • Using Conv1D instead of Conv2D for images
  • Confusing Dense layer with Conv2D
  • Wrong kernel_size format
3. Given this Keras CNN snippet, what is the output shape after the Conv2D layer?
model = Sequential()
model.add(Conv2D(16, (3,3), input_shape=(28,28,1)))
medium
A. (26, 26, 16)
B. (28, 28, 16)
C. (30, 30, 16)
D. (28, 28, 1)

Solution

  1. Step 1: Calculate output size after Conv2D

    With default 'valid' padding and kernel size 3, output dims = input - kernel + 1 = 28 - 3 + 1 = 26.
  2. Step 2: Determine output channels

    Filters=16 means output depth is 16 channels.
  3. Final Answer:

    (26, 26, 16) -> Option A
  4. Quick Check:

    Output shape = (26,26,16) [OK]
Hint: Output size = input - kernel + 1 with 'valid' padding [OK]
Common Mistakes:
  • Assuming output size equals input size without padding
  • Confusing number of filters with spatial dimensions
  • Forgetting default padding is 'valid'
4. Identify the error in this CNN model code snippet:
model = Sequential()
model.add(Conv2D(32, (3,3), activation='relu', input_shape=(28,28)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
medium
A. Dense layer should come before Flatten
B. input_shape missing channel dimension
C. Activation function 'relu' is invalid
D. Conv2D filters must be 64 or more

Solution

  1. Step 1: Check input_shape format

    Conv2D expects input_shape with 3 dimensions: height, width, channels. Here channels are missing.
  2. Step 2: Validate other parts

    Activation 'relu' is valid, Flatten before Dense is correct, filters can be any positive integer.
  3. Final Answer:

    input_shape missing channel dimension -> Option B
  4. Quick Check:

    Input shape must include channels [OK]
Hint: Conv2D input_shape needs (height, width, channels) [OK]
Common Mistakes:
  • Ignoring channel dimension in input_shape
  • Misordering Flatten and Dense layers
  • Thinking filters must be >=64
5. You want to build a CNN for classifying 64x64 RGB images into 5 classes. Which architecture choice is best?
hard
A. Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax)
B. Dense(128) + Dense(64) + Dense(5, softmax)
C. Conv1D(32, 3) + Flatten + Dense(5, softmax)
D. Flatten + Dense(5, softmax)

Solution

  1. Step 1: Identify suitable layers for image data

    Conv2D layers extract spatial features from 2D images; MaxPooling reduces size; Flatten prepares for Dense.
  2. Step 2: Evaluate options

    Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) uses Conv2D and pooling correctly for images. The Dense-only option lacks feature extraction, Conv1D is unsuitable for 2D images, and Flatten + Dense skips convolutions.
  3. Final Answer:

    Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) -> Option A
  4. Quick Check:

    Use Conv2D + pooling for images [OK]
Hint: Use Conv2D layers for images, not Dense-only or Conv1D [OK]
Common Mistakes:
  • Using Dense layers only for image input
  • Applying Conv1D to 2D images
  • Skipping pooling layers for downsampling