Bird
Raised Fist0
Computer Visionml~8 mins

Why OCR digitizes text from images in Computer Vision - Why Metrics Matter

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Why OCR digitizes text from images
Which metric matters for this concept and WHY

For OCR (Optical Character Recognition), the key metric is Character Error Rate (CER). CER measures how many characters the OCR got wrong compared to the true text. This matters because OCR's goal is to turn images of text into exact digital text. A low CER means the OCR reads text accurately, which is crucial for tasks like digitizing books or reading signs.

Other important metrics include Word Error Rate (WER), which looks at whole words instead of characters, and Accuracy, which shows the percentage of correctly recognized characters or words. These metrics help us know how well the OCR is doing its job.

Confusion matrix or equivalent visualization (ASCII)
Confusion matrix for OCR characters (example):

          Predicted
          A   B   C   ...
Actual A  90   2   1   ...
       B   3  85   4   ...
       C   0   5  88   ...
       ... ... ... ...  ...

- Diagonal numbers (e.g., 90, 85, 88) show correct recognitions (True Positives).
- Off-diagonal numbers show errors where one character was mistaken for another.

Total characters = sum of all numbers.
Character Error Rate (CER) = (Sum of all errors) / (Total characters).
Precision vs Recall tradeoff with concrete examples

In OCR, precision and recall can be thought of as:

  • Precision: Of all characters the OCR says are a certain letter, how many are correct?
  • Recall: Of all actual characters of that letter in the image, how many did the OCR find?

For example, if OCR reads a document and often mistakes 'O' for '0', precision for 'O' drops because many predicted 'O's are wrong. If OCR misses some 'O's completely, recall drops.

High precision but low recall means OCR is careful but misses many characters. High recall but low precision means OCR finds many characters but makes many mistakes. The best OCR balances both for accurate digitization.

What "good" vs "bad" metric values look like for this use case

Good OCR:

  • Character Error Rate (CER) below 5% (meaning 95%+ characters correct)
  • Word Error Rate (WER) below 10%
  • High precision and recall close to 1.0 (or 100%)

Bad OCR:

  • CER above 20% (many characters wrong)
  • WER above 30% (many words incorrect)
  • Low precision or recall, causing many wrong or missed characters

Good OCR means text is readable and usable without much correction. Bad OCR means lots of errors, making the text hard to understand or useless.

Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
  • Accuracy paradox: If most text is one character (like many spaces), a model guessing that character always can have high accuracy but poor real recognition.
  • Data leakage: If OCR is tested on images it has seen before, metrics look better than reality.
  • Overfitting: OCR trained too much on one font or style may fail on others, causing poor metrics on new data.
  • Ignoring context: OCR errors may be small but change meaning (e.g., '1' vs 'l'), so metrics should consider impact on usability.
Self-check: Your model has 98% accuracy but 12% recall on fraud. Is it good?

This question is about a fraud detection model, not OCR, but it teaches an important lesson.

98% accuracy sounds good, but 12% recall means the model finds only 12% of actual fraud cases. This is very low and means most fraud is missed.

For fraud detection, recall is critical because missing fraud is costly. So despite high accuracy, this model is not good for production.

Similarly, for OCR, a high accuracy but low recall or precision means the model misses or misreads many characters, so it is not reliable.

Key Result
Character Error Rate (CER) is key to measure OCR accuracy; low CER means accurate text digitization.

Practice

(1/5)
1. Why does OCR (Optical Character Recognition) convert images of text into digital text?
easy
A. To make the text editable and searchable on computers
B. To change the image colors
C. To compress the image size
D. To create new images from text

Solution

  1. Step 1: Understand OCR's main function

    OCR reads text from images and converts it into a format computers can edit and search.
  2. Step 2: Identify the purpose of digitizing text

    Making text editable and searchable helps users work with written content easily on digital devices.
  3. Final Answer:

    To make the text editable and searchable on computers -> Option A
  4. Quick Check:

    OCR digitizes text to edit/search it [OK]
Hint: OCR turns pictures of words into editable text [OK]
Common Mistakes:
  • Thinking OCR changes image colors
  • Confusing OCR with image compression
  • Believing OCR creates new images
2. Which of the following is the correct way to describe OCR's output?
easy
A. A new image with highlighted text
B. Editable and searchable text extracted from an image
C. A compressed version of the original image
D. A handwritten note scanned into a PDF

Solution

  1. Step 1: Identify OCR output type

    OCR outputs text that can be edited and searched, not images or compressed files.
  2. Step 2: Compare options to OCR output

    Only Editable and searchable text extracted from an image correctly describes OCR output as editable and searchable text.
  3. Final Answer:

    Editable and searchable text extracted from an image -> Option B
  4. Quick Check:

    OCR output = editable/searchable text [OK]
Hint: OCR outputs text, not images or compressed files [OK]
Common Mistakes:
  • Confusing OCR output with image files
  • Thinking OCR compresses images
  • Assuming OCR creates PDFs
3. Consider this Python snippet using an OCR library:
import pytesseract
from PIL import Image
img = Image.open('receipt.jpg')
text = pytesseract.image_to_string(img)
print(text)
What will this code output?
medium
A. An error because 'image_to_string' is not a valid function
B. The image 'receipt.jpg' displayed on screen
C. The text content found in the image 'receipt.jpg'
D. A compressed version of 'receipt.jpg'

Solution

  1. Step 1: Understand the code's purpose

    The code uses pytesseract to extract text from an image file named 'receipt.jpg'.
  2. Step 2: Identify the output of image_to_string

    image_to_string returns the text found in the image, which is then printed.
  3. Final Answer:

    The text content found in the image 'receipt.jpg' -> Option C
  4. Quick Check:

    pytesseract.image_to_string outputs text [OK]
Hint: pytesseract.image_to_string extracts text from images [OK]
Common Mistakes:
  • Thinking it displays the image
  • Believing image_to_string is invalid
  • Expecting image compression output
4. This code tries to extract text from an image but fails:
import pytesseract
from PIL import Image
img = Image.open('document.png')
text = pytesseract.image_to_text(img)
print(text)
What is the error and how to fix it?
medium
A. Image.open cannot open PNG files
B. Image file 'document.png' does not exist
C. Missing import for pytesseract
D. Function name is wrong; use image_to_string instead of image_to_text

Solution

  1. Step 1: Identify the function error

    The function pytesseract.image_to_text does not exist; the correct function is image_to_string.
  2. Step 2: Fix the function call

    Replace image_to_text with image_to_string to correctly extract text from the image.
  3. Final Answer:

    Function name is wrong; use image_to_string instead of image_to_text -> Option D
  4. Quick Check:

    Correct function = image_to_string [OK]
Hint: Use image_to_string, not image_to_text [OK]
Common Mistakes:
  • Using wrong function name
  • Assuming image file missing without checking
  • Thinking PNG files can't be opened
5. You want to digitize a large collection of scanned books using OCR. Which of these steps is most important to improve OCR accuracy before digitizing?
hard
A. Enhance image quality by cleaning noise and adjusting brightness
B. Convert images to grayscale without any preprocessing
C. Resize images to very small dimensions to save space
D. Skip preprocessing and run OCR directly on raw images

Solution

  1. Step 1: Understand OCR accuracy factors

    OCR works best on clear, clean images with good contrast and minimal noise.
  2. Step 2: Identify preprocessing to improve OCR

    Enhancing image quality by removing noise and adjusting brightness helps OCR read text more accurately.
  3. Final Answer:

    Enhance image quality by cleaning noise and adjusting brightness -> Option A
  4. Quick Check:

    Better image quality = better OCR accuracy [OK]
Hint: Clean and brighten images before OCR for best results [OK]
Common Mistakes:
  • Ignoring image preprocessing
  • Reducing image size too much
  • Assuming grayscale alone is enough