0
0
Computer Visionml~20 mins

Text detection in images in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Text detection in images
Problem:Detect and locate text regions in images using a deep learning model.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.35
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting to improve validation accuracy to at least 85% while keeping training accuracy below 92%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or input image preprocessing.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(rescale=1./255)

# Assume train_generator and val_generator are created from directories using train_datagen and val_datagen respectively

# Build model with dropout
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(128,128,3)),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stop]
)
Added dropout layers after convolutional and dense layers to reduce overfitting.
Implemented data augmentation to increase training data variety.
Reduced learning rate from default to 0.0005 for smoother training.
Added early stopping to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.35

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.18, Validation loss 0.25

Adding dropout and data augmentation helps reduce overfitting, improving validation accuracy and making the model generalize better to new images.
Bonus Experiment
Try using a pre-trained model like MobileNetV2 as a feature extractor for text detection and fine-tune it.
💡 Hint
Use transfer learning by freezing the base model layers and training only the top layers first, then unfreeze some layers for fine-tuning.