Which of the following best describes the main goal of a text detection model in images?
Think about what 'detection' means in the context of images.
Text detection models focus on finding where text appears in an image, not on reading or translating it.
What is the output of the following Python code snippet using OpenCV's EAST text detector after processing an image?
import cv2 import numpy as np # Assume 'image' is a loaded image net = cv2.dnn.readNet('frozen_east_text_detection.pb') blob = cv2.dnn.blobFromImage(image, 1.0, (320, 320), (123.68, 116.78, 103.94), True, False) net.setInput(blob) scores, geometry = net.forward(['feature_fusion/Conv_7/Sigmoid', 'feature_fusion/concat_3']) # Process scores and geometry to get boxes conf_threshold = 0.5 boxes = [] for y in range(scores.shape[2]): for x in range(scores.shape[3]): score = scores[0, 0, y, x] if score < conf_threshold: continue offsetX, offsetY = x * 4.0, y * 4.0 angle = geometry[0, 4, y, x] cos = np.cos(angle) sin = np.sin(angle) h = geometry[0, 0, y, x] + geometry[0, 2, y, x] w = geometry[0, 1, y, x] + geometry[0, 3, y, x] endX = int(offsetX + (cos * geometry[0, 1, y, x]) + (sin * geometry[0, 2, y, x])) endY = int(offsetY - (sin * geometry[0, 1, y, x]) + (cos * geometry[0, 2, y, x])) startX = int(endX - w) startY = int(endY - h) boxes.append((startX, startY, endX, endY)) print(len(boxes))
Look at what is appended to boxes and what is printed.
The code collects bounding box coordinates for detected text regions with confidence above 0.5 and prints how many such boxes were found.
You want to detect text in natural scene images with varying fonts and orientations. Which model architecture is most suitable?
Consider which model can handle spatial features and rotations for detection.
The EAST detector uses CNNs to detect text regions including rotated boxes, making it suitable for natural scenes with varied text orientations.
Which metric is most appropriate to evaluate the quality of text detection bounding boxes compared to ground truth boxes?
Think about how to measure overlap between predicted and actual boxes.
IoU measures how well predicted bounding boxes overlap with ground truth boxes, which is key for detection tasks.
You run a text detection model on an image but get zero detected boxes, even though the image clearly contains text. Which of the following is the most likely cause?
Consider what happens if the threshold for detection confidence is too strict.
If the confidence threshold is too high, the model may discard all detections, resulting in zero boxes.