Introduction
Document layout analysis helps computers understand how a page is organized, like where text, images, and tables are placed.
Jump into concepts and practice - no test required
1. Input: scanned document image 2. Preprocess image (resize, grayscale) 3. Use a layout analysis model (e.g., Detectron2, LayoutLMv3) 4. Model outputs bounding boxes and labels for layout elements 5. Postprocess to organize elements by reading order
Use a pre-trained layout detection model to find text blocks and images in a PDF page image.
Apply OCR only on detected text regions after layout analysis.
Detect tables and extract their structure for spreadsheet conversion.
import cv2 import matplotlib.pyplot as plt from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2 import model_zoo # Load image image_bgr = cv2.imread('sample_document.jpg') image = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB) # Setup Detectron2 config for layout detection cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5 cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 cfg.MODEL.WEIGHTS = "https://dl.fbaipublicfiles.com/detectron2/PubLayNet/mask_rcnn_R_50_FPN_3x/164590034/model_final_ba5f84.pkl" predictor = DefaultPredictor(cfg) # Run prediction outputs = predictor(image) # Extract boxes and classes boxes = outputs['instances'].pred_boxes.tensor.cpu().numpy() classes = outputs['instances'].pred_classes.cpu().numpy() # Show results for box, cls in zip(boxes, classes): x1, y1, x2, y2 = box.astype(int) cv2.rectangle(image_bgr, (x1, y1), (x2, y2), (0,255,0), 2) cv2.putText(image_bgr, str(cls), (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2) plt.imshow(cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)) plt.axis('off') plt.show() print(f'Found {len(boxes)} layout elements.')
document layout analysis in computer vision?from detectron2.layout import LayoutModel is the correct syntax. The other options use incorrect module paths or syntax.model = LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')
outputs = model.detect(image)
print(len(outputs))len(outputs) represent?model = LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')
outputs = model.detect()
print(outputs)