Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Azure OCR

OCR Engine Grooper.Cloud

Recognizes machine print and handprint using the Microsoft Azure Computer Vision API.

Remarks

This OCR engine combines the Azure Read API (a convolutional neural network-based recognizer) with a traditional OCR engine (such as Transym) to maximize both text accuracy and spatial precision.

Key Features:

  • Runs Azure OCR and traditional OCR in parallel, then compares and merges results.
  • Uses a voting and tie-breaking system to select the most reliable text for each segment.
  • Leverages a vocabulary lexicon and preferred regex patterns to resolve disagreements between engines.
  • Supports direct processing of continuous tone images (no binarization or pre-cleanup required).
  • Optionally filters out handwriting segments.

Validation and Repair Logic (via OcrChopper):

  • Each line detected by Azure is validated by comparing it to overlapping segments from the traditional OCR engine.
  • If Azure and traditional OCR disagree, the engine:
    • Prefers text matching user-supplied patterns or vocabulary.
    • Considers confidence scores from both engines.
    • Repairs or discards lines/words that are likely misreads (e.g., low-confidence Azure results with high-confidence traditional OCR overlap).
    • Suppresses handwriting lines if configured, especially when overlapping traditional OCR segments suggest a misread.
  • Deduplicates words that Azure may detect in multiple lines.
  • Replaces homoglyphs to normalize character data.
  • Supplements missing segments from traditional OCR if Azure omits them.

Usage Notes:

  • Configure the OCR Profile with no IP Profile, and disable both Synthesis and Junk Filtering.
  • For best results, provide a vocabulary lexicon and preferred patterns relevant to your documents.
  • See supported languages for a list of available languages.

Example Workflow:

  1. Azure and traditional OCR engines process the same image.
  2. Results are aligned and compared using OcrChopper.
  3. Segments are validated, repaired, and merged.
  4. Final output contains the most accurate and spatially correct text possible.

Properties

NameTypeDescription
General
Filtering
Voting

See Also

Used By

Notification