0
0
Computer Visionml~20 mins

Table extraction from images in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Table extraction from images
Problem:Extract tables from images to convert them into structured data like CSV or JSON.
Current Metrics:Current model detects tables with 85% precision but only 60% recall on validation images.
Issue:The model misses many tables (low recall), causing incomplete extraction.
Your Task
Improve recall to at least 80% while keeping precision above 80% for table detection.
Keep the model architecture based on Faster R-CNN.
Do not increase training time by more than 50%.
Use only publicly available datasets and libraries.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
import torch
from torch.utils.data import DataLoader, Dataset
import cv2
import numpy as np

class TableDataset(Dataset):
    def __init__(self, images, targets, transforms=None):
        self.images = images
        self.targets = targets
        self.transforms = transforms

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = self.images[idx]
        target = self.targets[idx]
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = F.to_tensor(img)
        if self.transforms:
            img = self.transforms(img)
        return img, target

# Load pretrained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2  # background and table
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

# Adjust anchor sizes to better fit tables
model.rpn.anchor_generator.sizes = ((32, 64, 128, 256, 512),)

# Example training loop snippet
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
num_epochs = 15

# Assume train_loader and val_loader are defined with TableDataset
for epoch in range(num_epochs):
    model.train()
    for images, targets in train_loader:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

# Post-processing: adjust NMS threshold
model.roi_heads.nms_thresh = 0.3

# Evaluate on validation set to compute precision and recall
# (Evaluation code omitted for brevity but includes IoU matching and metric calculation)
Added data augmentation to training images to increase variety.
Adjusted anchor box sizes to better match typical table dimensions.
Increased training epochs from 10 to 15 for better learning.
Lowered non-maximum suppression threshold to 0.3 to reduce missed detections.
Results Interpretation

Before: Precision 85%, Recall 60%
After: Precision 82%, Recall 81%

Improving recall often requires tuning model parameters and data augmentation. Balancing precision and recall is key for reliable table extraction.
Bonus Experiment
Try using a segmentation-based model like Mask R-CNN to extract table boundaries more precisely.
💡 Hint
Mask R-CNN can provide pixel-level masks for tables, which may improve extraction quality over bounding boxes.