0
0
Computer Visionml~20 mins

Document layout analysis in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Document layout analysis
Problem:You want to teach a computer to recognize different parts of a document page, like titles, paragraphs, images, and tables.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Validation loss: 1.2
Issue:The model is overfitting: it performs very well on training data but poorly on new unseen documents.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You can only change the model architecture and training parameters.
You cannot add more training data.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

# Sample simplified model for document layout analysis
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.3),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.3),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(5, activation='softmax')  # 5 classes: title, paragraph, image, table, other
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Early stopping to stop training when validation loss stops improving
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Assuming X_train, y_train, X_val, y_val are prepared image and label datasets
# model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stop])
Added dropout layers after convolution and dense layers to reduce overfitting.
Lowered learning rate from 0.001 to 0.0005 for smoother training.
Added early stopping callback to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Validation loss 1.2

After: Training accuracy 90%, Validation accuracy 87%, Validation loss 0.6

Adding dropout and early stopping helps the model generalize better by reducing overfitting, improving validation accuracy while slightly lowering training accuracy.
Bonus Experiment
Try using data augmentation techniques like random rotations, zooms, or flips on the training images to further improve validation accuracy.
💡 Hint
Use TensorFlow's ImageDataGenerator or tf.image functions to apply augmentations during training.