Bird
Raised Fist0
Computer Visionml~20 mins

Feature extraction approach in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Feature extraction approach
Problem:You want to classify images of handwritten digits using a simple machine learning model. Currently, you use raw pixel values as input features.
Current Metrics:Training accuracy: 98%, Validation accuracy: 75%
Issue:The model overfits the training data and performs poorly on validation data because raw pixels contain too much noise and irrelevant information.
Your Task
Improve validation accuracy to above 85% by using a better feature extraction method instead of raw pixels.
You must keep the same classifier (a simple logistic regression).
You cannot increase the size of the training dataset.
You should only change the feature extraction step.
Hint 1
Hint 2
Hint 3
Solution
Computer Vision
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from skimage.feature import hog
import numpy as np

# Load data
digits = load_digits()
X = digits.images
y = digits.target

# Extract HOG features for each image
list_hog_fd = []
for image in X:
    fd = hog(image, orientations=9, pixels_per_cell=(4, 4), cells_per_block=(2, 2), block_norm='L2-Hys')
    list_hog_fd.append(fd)
X_hog = np.array(list_hog_fd)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X_hog, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

# Train logistic regression
model = LogisticRegression(max_iter=1000, solver='lbfgs', multi_class='auto')
model.fit(X_train_scaled, y_train)

# Evaluate
train_acc = model.score(X_train_scaled, y_train) * 100
val_acc = model.score(X_val_scaled, y_val) * 100

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {val_acc:.2f}%")
Replaced raw pixel inputs with Histogram of Oriented Gradients (HOG) features to better capture important shapes and edges.
Scaled the features using StandardScaler to help the logistic regression converge better.
Kept the same logistic regression classifier.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 75% (overfitting)

After: Training accuracy: 92%, Validation accuracy: 87% (better generalization)

Using better features like HOG reduces noise and irrelevant information, helping the model generalize better and reduce overfitting.
Bonus Experiment
Try using Principal Component Analysis (PCA) on raw pixels to reduce dimensionality before training the logistic regression.
💡 Hint
Use sklearn.decomposition.PCA to keep enough components to explain 95% variance, then train the same logistic regression.

Practice

(1/5)
1. What is the main purpose of feature extraction in computer vision?
easy
A. To increase the size of image files
B. To change image colors randomly
C. To convert images into numbers that describe important parts
D. To delete parts of the image

Solution

  1. Step 1: Understand feature extraction goal

    Feature extraction transforms images into numerical data representing key details.
  2. Step 2: Compare options to this goal

    Only To convert images into numbers that describe important parts describes this process correctly; others describe unrelated actions.
  3. Final Answer:

    To convert images into numbers that describe important parts -> Option C
  4. Quick Check:

    Feature extraction = convert images to numbers [OK]
Hint: Feature extraction means turning images into numbers [OK]
Common Mistakes:
  • Thinking feature extraction changes image colors
  • Confusing feature extraction with image resizing
  • Believing it deletes image parts
2. Which of the following is a correct way to describe SIFT in feature extraction?
easy
A. A way to convert images to grayscale
B. A method that detects and describes local features in images
C. A technique to increase image resolution
D. A method to compress image files

Solution

  1. Step 1: Recall what SIFT does

    SIFT finds and describes important local features in images for matching and recognition.
  2. Step 2: Match options to SIFT's function

    Only A method that detects and describes local features in images correctly describes SIFT; others describe unrelated image processes.
  3. Final Answer:

    A method that detects and describes local features in images -> Option B
  4. Quick Check:

    SIFT = local feature detection [OK]
Hint: SIFT finds key points and describes them [OK]
Common Mistakes:
  • Confusing SIFT with image resizing
  • Thinking SIFT changes image colors
  • Believing SIFT compresses images
3. Given the following Python code using OpenCV, what will be the shape of the feature vector extracted by SIFT for an image with 500 keypoints?
import cv2
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(img, None)
print(descriptors.shape)
medium
A. (null, 128)
B. (128, 500)
C. (500, 64)
D. (500, 128)

Solution

  1. Step 1: Understand SIFT descriptor shape

    SIFT descriptors have 128 features per keypoint, so shape is (number_of_keypoints, 128).
  2. Step 2: Apply to given keypoints

    With 500 keypoints, descriptors shape is (500, 128).
  3. Final Answer:

    (500, 128) -> Option D
  4. Quick Check:

    SIFT descriptors shape = (keypoints, 128) [OK]
Hint: SIFT descriptors = keypoints x 128 features [OK]
Common Mistakes:
  • Swapping dimensions of descriptors
  • Assuming 64 features per keypoint
  • Thinking descriptors shape depends on image size
4. You wrote this code to extract features using SIFT but get an error:
import cv2
img = cv2.imread('image.jpg')
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(img, None)
print(len(keypoints))

What is the likely cause of the error?
medium
A. The image is not loaded in grayscale, causing SIFT to fail
B. SIFT_create() is not a valid OpenCV function
C. detectAndCompute requires a mask argument
D. print(len(keypoints)) is incorrect syntax

Solution

  1. Step 1: Check image loading method

    The image is loaded in color by default; SIFT expects grayscale images.
  2. Step 2: Identify error cause

    Not converting to grayscale can cause detectAndCompute to fail or return null.
  3. Final Answer:

    The image is not loaded in grayscale, causing SIFT to fail -> Option A
  4. Quick Check:

    Load image grayscale for SIFT [OK]
Hint: Always load images in grayscale for SIFT [OK]
Common Mistakes:
  • Thinking SIFT_create() is invalid
  • Believing mask argument is mandatory
  • Assuming print syntax is wrong
5. You want to extract features from images for a complex object recognition task. Which approach is best to capture detailed and high-level features?
hard
A. Use a deep learning model like a convolutional neural network (CNN)
B. Use simple edge detection filters only
C. Use random pixel values as features
D. Use image resizing without feature extraction

Solution

  1. Step 1: Understand feature needs for complex tasks

    Complex object recognition requires capturing detailed and abstract features.
  2. Step 2: Compare methods for feature extraction

    Deep learning models like CNNs learn rich features automatically, outperforming simple filters or random values.
  3. Final Answer:

    Use a deep learning model like a convolutional neural network (CNN) -> Option A
  4. Quick Check:

    Complex features need CNNs [OK]
Hint: Deep models capture complex features best [OK]
Common Mistakes:
  • Relying only on simple filters
  • Using random pixels as features
  • Skipping feature extraction by resizing only