Bird
Raised Fist0
Computer Visionml~8 mins

Tesseract OCR in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Tesseract OCR
Which metric matters for Tesseract OCR and WHY

Tesseract OCR turns images of text into actual text. The main goal is to get the text exactly right. So, Character Error Rate (CER) and Word Error Rate (WER) are the key metrics. They measure how many characters or words are wrong compared to the true text.

Lower CER and WER mean better OCR quality. Accuracy is also used, but CER and WER give a clearer picture of mistakes in text recognition.

Confusion matrix or equivalent visualization

For OCR, a confusion matrix can show how often one character is mistaken for another. For example:

      Actual \ Predicted |  a  |  o  |  e  |  l  |  i  
      ---------------------------------------------
                 a      | 90  |  5  |  2  |  1  |  2  
                 o      |  3  | 85  |  5  |  4  |  3  
                 e      |  2  |  4  | 88  |  3  |  3  
                 l      |  1  |  2  |  3  | 90  |  4  
                 i      |  2  |  3  |  3  |  5  | 87  
    

This shows how often Tesseract confuses letters. The diagonal numbers are correct predictions (True Positives for each character).

Precision vs Recall tradeoff with examples

In OCR, precision means how many recognized characters are actually correct. Recall means how many true characters were found by the OCR.

If precision is high but recall is low, the OCR is very sure about the characters it outputs but misses many characters (like skipping hard-to-read words).

If recall is high but precision is low, the OCR tries to read everything but makes many mistakes.

Example: For reading handwritten notes, high recall is important to capture all words, even if some are wrong. For legal documents, high precision is critical to avoid errors.

What "good" vs "bad" metric values look like for Tesseract OCR

Good OCR:

  • Character Error Rate (CER) below 5%
  • Word Error Rate (WER) below 10%
  • High precision and recall above 90%

Bad OCR:

  • CER above 20%
  • WER above 30%
  • Low precision or recall below 70%
  • Many confused characters or missing words
Common pitfalls in OCR metrics
  • Accuracy paradox: High accuracy can be misleading if most text is easy and errors are rare but critical.
  • Ignoring context: Metrics may not capture if recognized text makes sense.
  • Data leakage: Testing on images similar to training can inflate scores.
  • Overfitting: OCR tuned too much on one font or style may fail on others.
  • Ignoring layout: OCR may get characters right but fail to preserve reading order.
Self-check question

Your OCR model has 98% accuracy but a 12% recall on rare handwritten words. Is it good for production? Why or why not?

Answer: No, it is not good. Even though overall accuracy is high, the low recall on handwritten words means many words are missed. This can cause important information loss, especially if handwritten text is critical.

Key Result
Character Error Rate (CER) and Word Error Rate (WER) are key metrics to measure Tesseract OCR quality, focusing on text correctness rather than just accuracy.

Practice

(1/5)
1. What is the main purpose of Tesseract OCR in computer vision?
easy
A. To enhance image resolution
B. To detect objects in images
C. To convert images containing text into editable text
D. To classify images into categories

Solution

  1. Step 1: Understand Tesseract OCR's function

    Tesseract OCR is designed to read text from images and convert it into editable text format.
  2. Step 2: Compare options with Tesseract's purpose

    Image enhancement, object detection, and image classification relate to other computer vision tasks but not text extraction, which is Tesseract's main use.
  3. Final Answer:

    To convert images containing text into editable text -> Option C
  4. Quick Check:

    Tesseract OCR = Text extraction [OK]
Hint: Remember OCR means Optical Character Recognition [OK]
Common Mistakes:
  • Confusing OCR with image enhancement
  • Thinking Tesseract detects objects
  • Assuming it classifies images
2. Which Python function is used to extract text from an image using Tesseract?
easy
A. pytesseract.image_to_string()
B. pytesseract.extract_text()
C. pytesseract.read_image()
D. pytesseract.text_from_image()

Solution

  1. Step 1: Recall the correct pytesseract function

    The official function to get text from an image is image_to_string().
  2. Step 2: Verify other options

    Other options are not valid pytesseract functions and will cause errors.
  3. Final Answer:

    pytesseract.image_to_string() -> Option A
  4. Quick Check:

    Function for text extraction = image_to_string() [OK]
Hint: Use image_to_string() to get text from images [OK]
Common Mistakes:
  • Using non-existent pytesseract functions
  • Confusing function names with similar words
  • Forgetting parentheses in function call
3. What will be the output of this Python code snippet using pytesseract?
from PIL import Image
import pytesseract
img = Image.new('RGB', (100, 30), color = (255, 255, 255))
text = pytesseract.image_to_string(img)
print(text.strip())
medium
A. Random characters
B. Empty string
C. Error: Image not found
D. Whitespace characters

Solution

  1. Step 1: Analyze the image content

    The image is blank white with no text drawn on it.
  2. Step 2: Understand pytesseract output on blank images

    Since no text exists, pytesseract returns an empty string or whitespace which is stripped to empty.
  3. Final Answer:

    Empty string -> Option B
  4. Quick Check:

    Blank image text output = empty string [OK]
Hint: Blank images give empty text output [OK]
Common Mistakes:
  • Expecting error due to no text
  • Assuming random characters appear
  • Not stripping whitespace before print
4. Identify the error in this code snippet using pytesseract:
import pytesseract
text = pytesseract.image_to_string('image.png')
print(text)
medium
A. No error, code runs fine
B. Missing import for PIL Image
C. Incorrect function name used
D. Passing a filename string instead of an image object

Solution

  1. Step 1: Check function argument requirements

    image_to_string() accepts both PIL Image objects and strings representing image file paths.
  2. Step 2: Verify the code

    Passing a filename string 'image.png' is valid assuming the file exists and pytesseract is configured.
  3. Final Answer:

    No error, code runs fine -> Option A
  4. Quick Check:

    image_to_string() accepts file paths [OK]
Hint: pytesseract.image_to_string() accepts both image objects and file paths [OK]
Common Mistakes:
  • Thinking only PIL Image objects are accepted
  • Assuming PIL import is required for file paths
  • Believing the function cannot read files directly
5. You want to improve Tesseract OCR accuracy on a scanned document image with noise and skew. Which combination of preprocessing steps is best before using pytesseract.image_to_string()?
hard
A. Apply random color filters
B. Increase image brightness only
C. Resize image to smaller dimensions
D. Convert to grayscale, apply thresholding, and deskew the image

Solution

  1. Step 1: Understand common OCR preprocessing

    Grayscale conversion simplifies colors, thresholding makes text clearer, and deskew corrects tilted text improving OCR accuracy.
  2. Step 2: Evaluate other options

    Increasing brightness alone or resizing smaller can reduce quality; random color filters add noise, hurting OCR.
  3. Final Answer:

    Convert to grayscale, apply thresholding, and deskew the image -> Option D
  4. Quick Check:

    Preprocessing for OCR = grayscale + threshold + deskew [OK]
Hint: Clean and straighten image before OCR for best results [OK]
Common Mistakes:
  • Skipping deskewing step
  • Using color filters that add noise
  • Reducing image size too much