0
0
Computer Visionml~12 mins

Why OCR digitizes text from images in Computer Vision - Model Pipeline Impact

Choose your learning style9 modes available
Model Pipeline - Why OCR digitizes text from images

OCR (Optical Character Recognition) changes pictures of text into real text that computers can read and use. This helps us search, edit, and store text from images easily.

Data Flow - 6 Stages
1Input Image
1 image (e.g., 600 x 400 pixels, grayscale)Load image containing text1 image (600 x 400 pixels, grayscale)
Photo of a printed page with letters and numbers
2Preprocessing
1 image (600 x 400 pixels, grayscale)Convert to grayscale, remove noise, adjust brightness1 cleaned image (600 x 400 pixels, grayscale)
Clearer image with less background noise
3Text Detection
1 cleaned image (600 x 400 pixels, grayscale)Find areas likely containing textMultiple text regions (e.g., 5 boxes)
Boxes around words or lines in the image
4Character Segmentation
Text region images (varied sizes)Split text regions into individual charactersMultiple character images (e.g., 50 characters)
Small images each containing one letter or number
5Character Recognition
Character images (28 x 28 pixels each)Use ML model to identify each characterSequence of characters (e.g., 'HELLO123')
Predicted letters and numbers from images
6Postprocessing
Sequence of charactersCorrect errors, format textClean text string
'HELLO 123' as editable text
Training Trace - Epoch by Epoch

Loss
1.2 |****
1.0 |***
0.8 |**
0.6 |**
0.4 |*
0.2 |*
0.0 +----------------
      1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
11.20.45Model starts learning basic character shapes
20.80.65Recognition accuracy improves as model learns
30.50.80Model correctly identifies most characters
40.30.90Loss decreases steadily, accuracy nears 90%
50.20.94Model converges with high accuracy
Prediction Trace - 6 Layers
Layer 1: Input Image
Layer 2: Preprocessing
Layer 3: Text Detection
Layer 4: Character Segmentation
Layer 5: Character Recognition
Layer 6: Postprocessing
Model Quiz - 3 Questions
Test your understanding
Why does OCR preprocess the image before detecting text?
ATo remove noise and improve text clarity
BTo add colors to the image
CTo increase image size
DTo convert text into numbers
Key Insight
OCR works by turning images into clear text through steps that clean the image, find text areas, split characters, and recognize them. Training improves the model's ability to read characters accurately, making text from images usable for computers.