Computer-visionHow-ToBeginner · 4 min read

How to Use CNN for Image Classification in Computer Vision

To use a CNN (Convolutional Neural Network) for image classification, you build a model with convolutional layers that extract image features, followed by dense layers to classify images into categories. You train the CNN on labeled images using a loss function and optimizer, then use it to predict classes of new images.

📐

Syntax

A CNN model for image classification typically includes these parts:

Input layer: Accepts image data (height, width, channels).
Convolutional layers: Extract features using filters.
Activation functions: Add non-linearity, usually ReLU.
Pooling layers: Reduce spatial size to lower computation.
Flatten layer: Converts 2D features to 1D vector.
Dense (fully connected) layers: Learn to classify based on features.
Output layer: Uses softmax activation for multi-class classification.

python

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

💻

Example

This example shows how to build, train, and evaluate a CNN on the CIFAR-10 dataset, which has 10 image classes like airplanes and cats.

python

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Load data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train model
history = model.fit(x_train, y_train, epochs=3, validation_split=0.2)

# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print(f'Test accuracy: {accuracy:.4f}')

Output

Epoch 1/3 1250/1250 [==============================] - 22s 17ms/step - loss: 1.4812 - accuracy: 0.4603 - val_loss: 1.1834 - val_accuracy: 0.5798 Epoch 2/3 1250/1250 [==============================] - 21s 17ms/step - loss: 1.0647 - accuracy: 0.6231 - val_loss: 1.0107 - val_accuracy: 0.6464 Epoch 3/3 1250/1250 [==============================] - 21s 17ms/step - loss: 0.9003 - accuracy: 0.6837 - val_loss: 0.9279 - val_accuracy: 0.6750 313/313 [==============================] - 2s 6ms/step - loss: 0.9279 - accuracy: 0.6750 Test accuracy: 0.6750

⚠️

Common Pitfalls

Common mistakes when using CNNs for image classification include:

Not normalizing image pixel values, which slows training.
Using too few convolutional layers, limiting feature learning.
Overfitting by training too long without enough data or regularization.
Incorrect input shape causing model errors.
Using wrong loss function for classification tasks.

Always check data preprocessing and model architecture carefully.

python

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense

# Wrong: Missing normalization and pooling
model_wrong = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    Flatten(),
    Dense(10, activation='softmax')
])

# Right: Add normalization and pooling
from tensorflow.keras.layers import MaxPooling2D

model_right = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(10, activation='softmax')
])

📊

Quick Reference

Input shape: (height, width, channels), e.g., (32, 32, 3) for color images.
Conv2D: Extracts features with filters.
MaxPooling2D: Reduces feature map size.
Flatten: Converts 2D features to 1D vector.
Dense: Fully connected layer for classification.
Activation: Use ReLU for hidden layers, softmax for output.
Loss function: Use sparse_categorical_crossentropy for integer labels.
Optimizer: Adam is a good default choice.

✅

Key Takeaways

Build CNNs with convolution, pooling, flatten, and dense layers for image classification.

Normalize image data before training to improve model performance.

Use softmax activation and appropriate loss for multi-class classification.

Avoid overfitting by using enough data and proper model complexity.

Check input shapes and preprocessing carefully to prevent errors.