0
0
Computer Visionml~20 mins

Text recognition pipeline in Computer Vision - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Text Recognition Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Key stages in a text recognition pipeline

Which of the following lists the correct main stages of a typical text recognition pipeline in order?

AText detection → Text segmentation → Text recognition → Post-processing
BText recognition → Text detection → Text segmentation → Post-processing
CText segmentation → Text detection → Text recognition → Post-processing
DPost-processing → Text detection → Text segmentation → Text recognition
Attempts:
2 left
💡 Hint

Think about first finding where text is, then breaking it down, then reading it.

Predict Output
intermediate
2:00remaining
Output shape of CNN feature extractor in text recognition

Given the following PyTorch CNN feature extractor code for text recognition, what is the shape of the output tensor if the input image batch has shape (8, 1, 32, 128)?

Computer Vision
import torch
import torch.nn as nn

class FeatureExtractor(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # halves H and W
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)   # halves H and W again
        )

    def forward(self, x):
        return self.conv(x)

model = FeatureExtractor()
input_tensor = torch.randn(8, 1, 32, 128)
output = model(input_tensor)
output.shape
A(8, 128, 8, 32)
B(8, 128, 16, 64)
C(8, 64, 8, 32)
D(8, 64, 16, 64)
Attempts:
2 left
💡 Hint

Each MaxPool2d halves height and width. Calculate step by step.

Model Choice
advanced
2:00remaining
Best model type for recognizing variable-length text sequences

Which model architecture is best suited for recognizing variable-length text sequences in images, such as handwritten words or license plates?

AK-Nearest Neighbors classifier on raw pixel values
BSimple feedforward neural network with fixed-size input and output
CConvolutional Neural Network (CNN) followed by a Recurrent Neural Network (RNN) with CTC loss
DSupport Vector Machine with linear kernel on flattened image
Attempts:
2 left
💡 Hint

Think about models that handle sequences and variable lengths.

Metrics
advanced
2:00remaining
Choosing the best metric for text recognition accuracy

Which metric is most appropriate to evaluate the accuracy of a text recognition model that outputs sequences of characters?

AMean Squared Error (MSE)
BPrecision and Recall on image regions
CAccuracy on individual pixels
DCharacter Error Rate (CER)
Attempts:
2 left
💡 Hint

Consider metrics that compare predicted text sequences to ground truth text.

🔧 Debug
expert
2:00remaining
Identifying the cause of poor text recognition accuracy

A text recognition model trained on clear printed text images performs poorly on handwritten text images. Which is the most likely cause?

AThe loss function used is Mean Squared Error instead of Cross-Entropy
BThe training data distribution does not match the test data distribution
CThe optimizer used is incompatible with text recognition tasks
DThe model architecture cannot handle images larger than 64x64 pixels
Attempts:
2 left
💡 Hint

Think about differences between training and testing data.