Which of the following best describes the main goal of document layout analysis in computer vision?
Think about what parts of a document you want to separate before reading the text.
Document layout analysis focuses on detecting and segmenting different regions like paragraphs, images, and tables to understand the document's structure.
What is the output of the following Python code snippet using OpenCV for detecting contours in a document image?
import cv2 import numpy as np # Create a blank white image img = np.ones((100, 100), dtype=np.uint8) * 255 # Draw two black rectangles simulating text blocks cv2.rectangle(img, (10, 10), (40, 40), 0, -1) cv2.rectangle(img, (60, 60), (90, 90), 0, -1) # Threshold the image _, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY_INV) # Find contours contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) print(len(contours))
Each black rectangle should be detected as one contour.
The code draws two black rectangles on a white background, then inverts the image and finds contours. Each rectangle forms one contour, so the output is 2.
You want to build a model that detects and classifies regions like paragraphs, titles, tables, and figures in scanned documents. Which model architecture is most suitable?
Think about models that can locate and classify multiple objects in an image.
R-CNN models are designed to detect and segment multiple objects or regions in images, making them suitable for document layout analysis.
Which metric is most appropriate to evaluate the accuracy of detected layout regions compared to ground truth regions?
Consider a metric that measures overlap between predicted and true regions.
IoU measures the overlap between predicted and ground truth regions, making it ideal for evaluating segmentation tasks like layout analysis.
You have a pipeline that extracts text blocks from scanned documents using thresholding and contour detection. Sometimes, it misses small text blocks. Which change is most likely to fix this issue?
Think about how to connect small separated pixels to form bigger blocks.
Morphological dilation expands white regions, helping connect small text pixels into larger blocks that contours can detect.