0
0
Computer Visionml~8 mins

Text recognition pipeline in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Text recognition pipeline
Which metric matters for Text recognition pipeline and WHY

In text recognition, the main goal is to correctly read characters or words from images. The key metrics are Character Error Rate (CER) and Word Error Rate (WER). These measure how many characters or words the model got wrong compared to the true text. Lower CER and WER mean better recognition.

Accuracy is also used, but CER and WER give a clearer picture because they count insertions, deletions, and substitutions of characters or words, which are common errors in text recognition.

Confusion matrix or equivalent visualization

For text recognition, a confusion matrix can show how often one character is mistaken for another. For example, the letter 'O' might be confused with '0'.

      Confusion Matrix (Characters):
      ---------------------------------
      |     | O | 0 | I | l | ...       |
      |-----|---|---|---|---|-----------|
      | O   |50 | 5 | 0 | 0 | ...       |
      | 0   | 3 |45 | 1 | 0 | ...       |
      | I   | 0 | 1 |48 | 2 | ...       |
      | l   | 0 | 0 | 3 |47 | ...       |
      ---------------------------------
    

This matrix helps identify which characters are often mixed up, guiding improvements.

Precision vs Recall tradeoff with concrete examples

In text recognition, precision means how many recognized characters are correct, while recall means how many true characters were found.

For example, if the model reads extra characters not in the image (false positives), precision drops. If it misses characters (false negatives), recall drops.

In some cases, like reading license plates, high precision is important to avoid wrong characters. In others, like digitizing books, high recall is important to capture all text even if some errors occur.

What "good" vs "bad" metric values look like for Text recognition

Good: CER and WER below 5% means the model reads almost all characters and words correctly. Precision and recall close to 1.0 show very few mistakes.

Bad: CER or WER above 20% means many errors. Precision or recall below 0.7 means the model often misses or wrongly adds characters.

Common pitfalls in metrics for Text recognition
  • Ignoring insertions and deletions: Just counting correct characters misses errors where characters are added or skipped.
  • Data leakage: Testing on images very similar to training can give overly optimistic metrics.
  • Overfitting: Very low error on training but high error on new images means the model memorizes instead of learning.
  • Ignoring context: Some errors are more serious (e.g., confusing '1' and 'l' in a word) but metrics treat all errors equally.
Self-check question

Your text recognition model has 98% accuracy but a 15% Character Error Rate (CER). Is it good for production? Why or why not?

Answer: No, it is not good. Accuracy here might count only exact matches, ignoring small errors. A 15% CER means many characters are wrong, which can cause serious mistakes in reading text. You want both high accuracy and low CER for reliable text recognition.

Key Result
Character Error Rate (CER) and Word Error Rate (WER) are key metrics showing how accurately text is recognized, with lower values indicating better performance.