A CNN (Convolutional Neural Network) helps computers see and understand images by looking at small parts step-by-step.
CNN architecture review in Computer Vision
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
Computer Vision
model = Sequential([
Conv2D(filters, kernel_size, activation='relu', input_shape=(height, width, channels)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(units, activation='relu'),
Dense(num_classes, activation='softmax')
])Conv2D looks at small image parts to find features.
MaxPooling2D shrinks the image to keep important info and reduce size.
Examples
Computer Vision
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))
Computer Vision
MaxPooling2D((2, 2))
Computer Vision
Dense(128, activation='relu')
Computer Vision
Dense(10, activation='softmax')
Sample Model
This code builds a small CNN to classify 28x28 grayscale images into 10 classes. It trains on random data for 1 round and shows predicted classes for 5 images.
Computer Vision
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Build a simple CNN model model = Sequential([ Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)), MaxPooling2D((2, 2)), Flatten(), Dense(32, activation='relu'), Dense(10, activation='softmax') ]) # Compile the model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Create dummy data: 100 grayscale images 28x28 and labels import numpy as np x_train = np.random.random((100, 28, 28, 1)) y_train = np.random.randint(0, 10, 100) # Train the model for 1 epoch history = model.fit(x_train, y_train, epochs=1, batch_size=10, verbose=2) # Make predictions on first 5 images predictions = model.predict(x_train[:5]) predicted_classes = predictions.argmax(axis=1) print('Predicted classes for first 5 images:', predicted_classes)
Important Notes
Start with small filters like 3x3 to capture details.
Pooling layers help reduce image size and computation.
Use activation functions like ReLU to add non-linearity.
Summary
CNNs look at images piece by piece to find patterns.
They use layers like Conv2D, Pooling, Flatten, and Dense.
They are great for tasks like image recognition and classification.
Practice
1. What is the main purpose of a Convolutional Neural Network (CNN) in computer vision?
easy
Solution
Step 1: Understand CNN function
CNNs scan images to find important patterns like edges and shapes.Step 2: Match purpose to options
Only To detect patterns and features in images describes detecting patterns in images, which is CNN's main job.Final Answer:
To detect patterns and features in images -> Option DQuick Check:
CNN purpose = detect image patterns [OK]
Hint: CNNs find image features, not unrelated tasks like sorting [OK]
Common Mistakes:
- Confusing CNNs with general neural networks
- Thinking CNNs generate images
- Mixing CNNs with text processing models
2. Which of the following is the correct way to add a 2D convolutional layer in Keras?
easy
Solution
Step 1: Identify Conv2D syntax
Conv2D requires filters, kernel_size as a tuple, and activation function.Step 2: Compare options
Conv2D(filters=32, kernel_size=(3,3), activation='relu')matches Conv2D syntax correctly; others are different layers or wrong dimensions.Final Answer:
Conv2D(filters=32, kernel_size=(3,3), activation='relu') -> Option CQuick Check:
Conv2D syntax =Conv2D(filters=32, kernel_size=(3,3), activation='relu')[OK]
Hint: Conv2D uses 2D kernel size tuple, not single int [OK]
Common Mistakes:
- Using Conv1D instead of Conv2D for images
- Confusing Dense layer with Conv2D
- Wrong kernel_size format
3. Given this Keras CNN snippet, what is the output shape after the Conv2D layer?
model = Sequential() model.add(Conv2D(16, (3,3), input_shape=(28,28,1)))
medium
Solution
Step 1: Calculate output size after Conv2D
With default 'valid' padding and kernel size 3, output dims = input - kernel + 1 = 28 - 3 + 1 = 26.Step 2: Determine output channels
Filters=16 means output depth is 16 channels.Final Answer:
(26, 26, 16) -> Option AQuick Check:
Output shape = (26,26,16) [OK]
Hint: Output size = input - kernel + 1 with 'valid' padding [OK]
Common Mistakes:
- Assuming output size equals input size without padding
- Confusing number of filters with spatial dimensions
- Forgetting default padding is 'valid'
4. Identify the error in this CNN model code snippet:
model = Sequential() model.add(Conv2D(32, (3,3), activation='relu', input_shape=(28,28))) model.add(Flatten()) model.add(Dense(10, activation='softmax'))
medium
Solution
Step 1: Check input_shape format
Conv2D expects input_shape with 3 dimensions: height, width, channels. Here channels are missing.Step 2: Validate other parts
Activation 'relu' is valid, Flatten before Dense is correct, filters can be any positive integer.Final Answer:
input_shape missing channel dimension -> Option BQuick Check:
Input shape must include channels [OK]
Hint: Conv2D input_shape needs (height, width, channels) [OK]
Common Mistakes:
- Ignoring channel dimension in input_shape
- Misordering Flatten and Dense layers
- Thinking filters must be >=64
5. You want to build a CNN for classifying 64x64 RGB images into 5 classes. Which architecture choice is best?
hard
Solution
Step 1: Identify suitable layers for image data
Conv2D layers extract spatial features from 2D images; MaxPooling reduces size; Flatten prepares for Dense.Step 2: Evaluate options
Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) uses Conv2D and pooling correctly for images. The Dense-only option lacks feature extraction, Conv1D is unsuitable for 2D images, and Flatten + Dense skips convolutions.Final Answer:
Conv2D(32, (3,3)) + MaxPooling2D + Conv2D(64, (3,3)) + Flatten + Dense(5, softmax) -> Option AQuick Check:
Use Conv2D + pooling for images [OK]
Hint: Use Conv2D layers for images, not Dense-only or Conv1D [OK]
Common Mistakes:
- Using Dense layers only for image input
- Applying Conv1D to 2D images
- Skipping pooling layers for downsampling
