Bird
Raised Fist0
Computer Visionml~5 mins

CNN architecture review in Computer Vision - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the main purpose of convolutional layers in a CNN?
Convolutional layers detect local patterns like edges or textures by sliding small filters over the input image, helping the model learn important features.
Click to reveal answer
beginner
Why do CNNs use pooling layers?
Pooling layers reduce the size of feature maps, making the model faster and less sensitive to small shifts or distortions in the input.
Click to reveal answer
beginner
What role do fully connected layers play in a CNN?
Fully connected layers combine all extracted features to make final predictions, like classifying the image into categories.
Click to reveal answer
intermediate
Explain the concept of 'stride' in convolutional layers.
Stride is how many pixels the filter moves at each step when sliding over the input. Larger strides reduce output size but may skip details.
Click to reveal answer
intermediate
What is the benefit of using multiple convolutional layers stacked together?
Stacking layers lets the CNN learn complex features step-by-step, from simple edges in early layers to detailed shapes in deeper layers.
Click to reveal answer
What does a convolutional filter primarily detect in an image?
AThe entire image color
BLocal patterns like edges
CRandom noise
DImage file size
Which layer type reduces the spatial size of feature maps in a CNN?
ADropout layer
BFully connected layer
CPooling layer
DConvolutional layer
What does the 'stride' parameter control in a convolutional layer?
AThe number of filters
BThe learning rate
CThe color channels
DHow far the filter moves each step
Why are multiple convolutional layers stacked in a CNN?
ATo learn complex features progressively
BTo increase image size
CTo reduce training data
DTo avoid overfitting
What is the main function of fully connected layers in CNNs?
ACombine features for final prediction
BDetect edges in images
CReduce image size
DNormalize input data
Describe the main components of a CNN architecture and their roles.
Think about how the model processes images step-by-step.
You got /4 concepts.
    Explain why stacking multiple convolutional layers helps a CNN learn better.
    Consider how deeper layers build on earlier layers' outputs.
    You got /3 concepts.

      Practice

      (1/5)
      1. What is the main purpose of a Convolutional Neural Network (CNN) in computer vision?
      easy
      A. To perform text translation
      B. To sort numbers in a list
      C. To generate random images
      D. To detect patterns and features in images

      Solution

      1. Step 1: Understand CNN function

        CNNs scan images to find important patterns like edges and shapes.
      2. Step 2: Match purpose to options

        Only To detect patterns and features in images describes detecting patterns in images, which is CNN's main job.
      3. Final Answer:

        To detect patterns and features in images -> Option D
      4. Quick Check:

        CNN purpose = detect image patterns [OK]
      Hint: CNNs find image features, not unrelated tasks like sorting [OK]
      Common Mistakes:
      • Confusing CNNs with general neural networks
      • Thinking CNNs generate images
      • Mixing CNNs with text processing models
      2. Which of the following is the correct way to add a 2D convolutional layer in Keras?
      easy
      A. Dense(units=32, activation='relu')
      B. Conv1D(filters=32, kernel_size=3, activation='relu')
      C. Conv2D(filters=32, kernel_size=(3,3), activation='relu')
      D. MaxPooling2D(pool_size=(2,2))

      Solution

      1. Step 1: Identify Conv2D syntax

        Conv2D requires filters, kernel_size as a tuple, and activation function.
      2. Step 2: Compare options

        Conv2D(filters=32, kernel_size=(3,3), activation='relu') matches Conv2D syntax correctly; others are different layers or wrong dimensions.
      3. Final Answer:

        Conv2D(filters=32, kernel_size=(3,3), activation='relu') -> Option C
      4. Quick Check:

        Conv2D syntax = Conv2D(filters=32, kernel_size=(3,3), activation='relu') [OK]
      Hint: Conv2D uses 2D kernel size tuple, not single int [OK]
      Common Mistakes:
      • Using Conv1D instead of Conv2D for images
      • Confusing Dense layer with Conv2D
      • Wrong kernel_size format
      3. Given this Keras CNN snippet, what is the output shape after the Conv2D layer?
      model = Sequential()
      model.add(Conv2D(16, (3,3), input_shape=(28,28,1)))
      medium
      A. (26, 26, 16)
      B. (28, 28, 16)
      C. (30, 30, 16)
      D. (28, 28, 1)

      Solution

      1. Step 1: Calculate output size after Conv2D

        With default 'valid' padding and kernel size 3, output dims = input - kernel + 1 = 28 - 3 + 1 = 26.
      2. Step 2: Determine output channels

        Filters=16 means output depth is 16 channels.
      3. Final Answer:

        (26, 26, 16) -> Option A
      4. Quick Check:

        Output shape = (26,26,16) [OK]
      Hint: Output size = input - kernel + 1 with 'valid' padding [OK]
      Common Mistakes:
      • Assuming output size equals input size without padding
      • Confusing number of filters with spatial dimensions
      • Forgetting default padding is 'valid'
      4. Identify the error in this CNN model code snippet:
      model = Sequential()
      model.add(Conv2D(32, (3,3), activation='relu', input_shape=(28,28)))
      model.add(Flatten())
      model.add(Dense(10, activation='softmax'))
      medium
      A. Dense layer should come before Flatten
      B. input_shape missing channel dimension
      C. Activation function 'relu' is invalid
      D. Conv2D filters must be 64 or more

      Solution

      1. Step 1: Check input_shape format

        Conv2D expects input_shape with 3 dimensions: height, width, channels. Here channels are missing.
      2. Step 2: Validate other parts

        Activation 'relu' is valid, Flatten before Dense is correct, filters can be any positive integer.
      3. Final Answer:

        input_shape missing channel dimension -> Option B
      4. Quick Check:

        Input shape must include channels [OK]
      Hint: Conv2D input_shape needs (height, width, channels) [OK]
      Common Mistakes:
      • Ignoring channel dimension in input_shape
      • Misordering Flatten and Dense layers
      • Thinking filters must be >=64
      5. You want to build a CNN for classifying 64x64 RGB images into 5 classes. Which architecture choice is best?
      hard
      A. Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax)
      B. Dense(128) + Dense(64) + Dense(5, softmax)
      C. Conv1D(32, 3) + Flatten + Dense(5, softmax)
      D. Flatten + Dense(5, softmax)

      Solution

      1. Step 1: Identify suitable layers for image data

        Conv2D layers extract spatial features from 2D images; MaxPooling reduces size; Flatten prepares for Dense.
      2. Step 2: Evaluate options

        Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) uses Conv2D and pooling correctly for images. The Dense-only option lacks feature extraction, Conv1D is unsuitable for 2D images, and Flatten + Dense skips convolutions.
      3. Final Answer:

        Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) -> Option A
      4. Quick Check:

        Use Conv2D + pooling for images [OK]
      Hint: Use Conv2D layers for images, not Dense-only or Conv1D [OK]
      Common Mistakes:
      • Using Dense layers only for image input
      • Applying Conv1D to 2D images
      • Skipping pooling layers for downsampling