Bird
Raised Fist0
Computer Visionml~20 mins

Table extraction from images in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Table extraction from images
Problem:Extract tables from images to convert them into structured data like CSV or JSON.
Current Metrics:Current model detects tables with 85% precision but only 60% recall on validation images.
Issue:The model misses many tables (low recall), causing incomplete extraction.
Your Task
Improve recall to at least 80% while keeping precision above 80% for table detection.
Keep the model architecture based on Faster R-CNN.
Do not increase training time by more than 50%.
Use only publicly available datasets and libraries.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
import torch
from torch.utils.data import DataLoader, Dataset
import cv2
import numpy as np

class TableDataset(Dataset):
    def __init__(self, images, targets, transforms=None):
        self.images = images
        self.targets = targets
        self.transforms = transforms

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = self.images[idx]
        target = self.targets[idx]
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = F.to_tensor(img)
        if self.transforms:
            img = self.transforms(img)
        return img, target

# Load pretrained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2  # background and table
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

# Adjust anchor sizes to better fit tables
model.rpn.anchor_generator.sizes = ((32, 64, 128, 256, 512),)

# Example training loop snippet
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
num_epochs = 15

# Assume train_loader and val_loader are defined with TableDataset
for epoch in range(num_epochs):
    model.train()
    for images, targets in train_loader:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

# Post-processing: adjust NMS threshold
model.roi_heads.nms_thresh = 0.3

# Evaluate on validation set to compute precision and recall
# (Evaluation code omitted for brevity but includes IoU matching and metric calculation)
Added data augmentation to training images to increase variety.
Adjusted anchor box sizes to better match typical table dimensions.
Increased training epochs from 10 to 15 for better learning.
Lowered non-maximum suppression threshold to 0.3 to reduce missed detections.
Results Interpretation

Before: Precision 85%, Recall 60%
After: Precision 82%, Recall 81%

Improving recall often requires tuning model parameters and data augmentation. Balancing precision and recall is key for reliable table extraction.
Bonus Experiment
Try using a segmentation-based model like Mask R-CNN to extract table boundaries more precisely.
💡 Hint
Mask R-CNN can provide pixel-level masks for tables, which may improve extraction quality over bounding boxes.

Practice

(1/5)
1. What is the main goal of table extraction from images in computer vision?
easy
A. Create new tables from scratch
B. Convert images of tables into editable and structured data
C. Enhance the colors of table images
D. Compress table images to save space

Solution

  1. Step 1: Understand the purpose of table extraction

    Table extraction aims to transform images containing tables into a format that can be edited and analyzed, such as spreadsheets.
  2. Step 2: Compare options to the goal

    Options A, B, and D do not relate to converting image content into editable data, but C does.
  3. Final Answer:

    Convert images of tables into editable and structured data -> Option B
  4. Quick Check:

    Table extraction = Editable data from images [OK]
Hint: Focus on converting images to editable data [OK]
Common Mistakes:
  • Confusing image enhancement with data extraction
  • Thinking table extraction creates tables from nothing
  • Assuming compression is the goal
2. Which of the following is the correct step to start table extraction from an image using Python libraries?
easy
A. Use OCR to read text directly without detecting table structure
B. Resize the image to a smaller size and save it
C. Detect table boundaries and cells before applying OCR
D. Apply color filters to change table colors

Solution

  1. Step 1: Identify the correct workflow for table extraction

    First, detecting the table structure (boundaries and cells) is essential to know where text is located.
  2. Step 2: Understand the role of OCR

    OCR reads text inside detected cells after structure detection, so applying OCR first is incorrect.
  3. Final Answer:

    Detect table boundaries and cells before applying OCR -> Option C
  4. Quick Check:

    Detect structure first, then OCR [OK]
Hint: Detect table layout before reading text [OK]
Common Mistakes:
  • Applying OCR before detecting table cells
  • Focusing on image color changes instead of structure
  • Skipping structure detection
3. Given the following Python snippet using OpenCV and pytesseract for table extraction, what will be the output type of cells_text?
import cv2
import pytesseract

image = cv2.imread('table.png', 0)
_, thresh = cv2.threshold(image, 128, 255, cv2.THRESH_BINARY_INV)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cells_text = []
for cnt in contours:
    x, y, w, h = cv2.boundingRect(cnt)
    cell_img = image[y:y+h, x:x+w]
    text = pytesseract.image_to_string(cell_img, config='--psm 6')
    cells_text.append(text.strip())
print(type(cells_text))
medium
A.
B.
C.
D.

Solution

  1. Step 1: Analyze the code snippet

    The variable cells_text is initialized as an empty list and text from each detected cell is appended to it.
  2. Step 2: Determine the type of cells_text

    Since cells_text collects multiple strings in a list, its type remains list.
  3. Final Answer:

    <class 'list'> -> Option A
  4. Quick Check:

    Appending text to list = list type [OK]
Hint: Check variable initialization and append usage [OK]
Common Mistakes:
  • Confusing the output of print(type())
  • Assuming OCR returns a dict or int
  • Ignoring the list append operation
4. You run a table extraction pipeline but notice that some table cells are merged incorrectly, causing wrong text grouping. What is the most likely cause?
medium
A. Incorrect contour detection merging nearby cells
B. OCR engine misreading characters inside cells
C. Image color enhancement applied before extraction
D. Saving the output file in wrong format

Solution

  1. Step 1: Identify the problem source

    Merged cells usually happen when contour detection groups multiple cells as one shape.
  2. Step 2: Rule out other options

    OCR misreading affects text accuracy but not cell merging. Color enhancement and file format do not cause merging issues.
  3. Final Answer:

    Incorrect contour detection merging nearby cells -> Option A
  4. Quick Check:

    Cell merging = contour detection error [OK]
Hint: Check contour detection for cell boundaries [OK]
Common Mistakes:
  • Blaming OCR for cell merging
  • Ignoring image preprocessing effects
  • Assuming file format affects cell detection
5. You want to extract tables from scanned invoices with varying layouts. Which approach best improves accuracy of table extraction?
hard
A. Apply fixed thresholding and contour detection without training
B. Manually crop each table region before extraction
C. Use only OCR on the full invoice image without detecting tables
D. Train a deep learning model to detect table structures and cells before OCR

Solution

  1. Step 1: Understand the challenge of varying layouts

    Invoices have different table styles, so fixed rules may fail to detect tables accurately.
  2. Step 2: Evaluate approaches for adaptability

    Training a deep learning model can learn diverse table structures and generalize better than fixed methods or manual cropping.
  3. Final Answer:

    Train a deep learning model to detect table structures and cells before OCR -> Option D
  4. Quick Check:

    Varying layouts = train model for detection [OK]
Hint: Use learning models for diverse table layouts [OK]
Common Mistakes:
  • Relying on fixed thresholding for all layouts
  • Skipping table detection and using only OCR
  • Manual cropping is not scalable