Computer Visionml~20 mins

Text detection in images in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Text detection in images

Problem:Detect and locate text regions in images using a deep learning model.

Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.35

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.

Your Task

Reduce overfitting to improve validation accuracy to at least 85% while keeping training accuracy below 92%.

You can only modify the model architecture and training hyperparameters.

Do not change the dataset or input image preprocessing.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Computer Vision

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(rescale=1./255)

# Assume train_generator and val_generator are created from directories using train_datagen and val_datagen respectively

# Build model with dropout
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(128,128,3)),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stop]
)

Added dropout layers after convolutional and dense layers to reduce overfitting.

Implemented data augmentation to increase training data variety.

Reduced learning rate from default to 0.0005 for smoother training.

Added early stopping to stop training when validation loss stops improving.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.35

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.18, Validation loss 0.25

Adding dropout and data augmentation helps reduce overfitting, improving validation accuracy and making the model generalize better to new images.

Bonus Experiment

Try using a pre-trained model like MobileNetV2 as a feature extractor for text detection and fine-tune it.

💡 Hint

Use transfer learning by freezing the base model layers and training only the top layers first, then unfreeze some layers for fine-tuning.

Practice

(1/5)

1. What is the main goal of text detection in images?

easy

A. To find where text appears in an image

B. To translate text from one language to another

C. To change the font style of text in images

D. To remove text from images

Text detection in images in Computer Vision - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of text detection

Step 2: Differentiate from other text-related tasks

Final Answer:

Quick Check:

Solution

Step 1: Identify libraries related to text detection

Step 2: Exclude unrelated libraries

Final Answer:

Quick Check:

Solution

Step 1: Understand the code flow

Step 2: Predict output for a clear text image

Final Answer:

Quick Check:

Solution

Step 1: Check input type for pytesseract.image_to_string

Step 2: Verify the code

Final Answer:

Quick Check:

Solution

Step 1: Understand multi-language text detection

Step 2: Evaluate other options

Final Answer:

Quick Check: