0
0
Computer Visionml~8 mins

Why OCR digitizes text from images in Computer Vision - Why Metrics Matter

Choose your learning style9 modes available
Metrics & Evaluation - Why OCR digitizes text from images
Which metric matters for this concept and WHY

For OCR (Optical Character Recognition), the key metric is Character Error Rate (CER). CER measures how many characters the OCR got wrong compared to the true text. This matters because OCR's goal is to turn images of text into exact digital text. A low CER means the OCR reads text accurately, which is crucial for tasks like digitizing books or reading signs.

Other important metrics include Word Error Rate (WER), which looks at whole words instead of characters, and Accuracy, which shows the percentage of correctly recognized characters or words. These metrics help us know how well the OCR is doing its job.

Confusion matrix or equivalent visualization (ASCII)
Confusion matrix for OCR characters (example):

          Predicted
          A   B   C   ...
Actual A  90   2   1   ...
       B   3  85   4   ...
       C   0   5  88   ...
       ... ... ... ...  ...

- Diagonal numbers (e.g., 90, 85, 88) show correct recognitions (True Positives).
- Off-diagonal numbers show errors where one character was mistaken for another.

Total characters = sum of all numbers.
Character Error Rate (CER) = (Sum of all errors) / (Total characters).
Precision vs Recall tradeoff with concrete examples

In OCR, precision and recall can be thought of as:

  • Precision: Of all characters the OCR says are a certain letter, how many are correct?
  • Recall: Of all actual characters of that letter in the image, how many did the OCR find?

For example, if OCR reads a document and often mistakes 'O' for '0', precision for 'O' drops because many predicted 'O's are wrong. If OCR misses some 'O's completely, recall drops.

High precision but low recall means OCR is careful but misses many characters. High recall but low precision means OCR finds many characters but makes many mistakes. The best OCR balances both for accurate digitization.

What "good" vs "bad" metric values look like for this use case

Good OCR:

  • Character Error Rate (CER) below 5% (meaning 95%+ characters correct)
  • Word Error Rate (WER) below 10%
  • High precision and recall close to 1.0 (or 100%)

Bad OCR:

  • CER above 20% (many characters wrong)
  • WER above 30% (many words incorrect)
  • Low precision or recall, causing many wrong or missed characters

Good OCR means text is readable and usable without much correction. Bad OCR means lots of errors, making the text hard to understand or useless.

Metrics pitfalls (accuracy paradox, data leakage, overfitting indicators)
  • Accuracy paradox: If most text is one character (like many spaces), a model guessing that character always can have high accuracy but poor real recognition.
  • Data leakage: If OCR is tested on images it has seen before, metrics look better than reality.
  • Overfitting: OCR trained too much on one font or style may fail on others, causing poor metrics on new data.
  • Ignoring context: OCR errors may be small but change meaning (e.g., '1' vs 'l'), so metrics should consider impact on usability.
Self-check: Your model has 98% accuracy but 12% recall on fraud. Is it good?

This question is about a fraud detection model, not OCR, but it teaches an important lesson.

98% accuracy sounds good, but 12% recall means the model finds only 12% of actual fraud cases. This is very low and means most fraud is missed.

For fraud detection, recall is critical because missing fraud is costly. So despite high accuracy, this model is not good for production.

Similarly, for OCR, a high accuracy but low recall or precision means the model misses or misreads many characters, so it is not reliable.

Key Result
Character Error Rate (CER) is key to measure OCR accuracy; low CER means accurate text digitization.