Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

OCR Profile

Node Grooper.OCR

Defines a configurable profile for performing optical character recognition (OCR) on images in Grooper.

Remarks

An OCR Profile encapsulates the settings and logic required to extract text from images using OCR. It controls the entire OCR workflow, including image preprocessing, character recognition, segmentation, synthesis, filtering, and result repair.

Overview

  • An OCR Profile determines how images are processed and recognized, from initial cleanup to final text output.
  • It is referenced by the Recognize activity and other configuration objects that require OCR.
  • The profile is highly configurable, supporting a wide range of document types, layouts, and quality requirements.

OCR Processing Workflow

The OCR process, as orchestrated by an OCR Profile, consists of several key stages:

  1. Image Preprocessing

    • If an IP Profile is specified, it is applied to the image for temporary cleanup (e.g., noise removal, line detection).
    • The original image is not permanently altered.
    • The 'Region of Interest' property can restrict OCR to a specific area of the page.
  2. Segmentation and Synthesis

    • The selected OCR Engine is used to recognize text.
    • If 'Synthesis Method' is enabled, Grooper re-synthesizes the engine's output to improve text flow and structure.
    • Bound regions (e.g., boxes) can be processed independently using 'Image Segmentation'.
  3. Iterative Processing

    • If 'Iterations' > 1, OCR is performed in multiple passes, with recognized characters dropped out between passes to improve accuracy.
    • 'Cell Validation' divides the image into a grid, performing OCR on each cell and merging results to handle complex layouts.
  4. Filtering and Repair

    • Results are filtered based on confidence, size, font, edge proximity, and symbol ratio.
    • Junk filtering removes stray marks and artifacts.
    • Segments below the 'Reprocessing Threshold' can be automatically reprocessed for improved accuracy.
    • OCR Repair Options can be applied to correct common recognition errors.

Configuration Guidance

  • OCR Engine: Choose the engine best suited for your documents.
  • IP Profile: Use for image cleanup, but avoid commands that alter image size or resolution.
  • Synthesis and Segmentation: Enable synthesis for improved text flow; use segmentation for forms or boxed layouts.
  • Filtering: Adjust confidence, size, and junk filtering to balance accuracy and completeness.
  • Cell Validation: Enable for documents with columns or irregular layouts.

Best Practices

  • Test your OCR Profile on representative samples to fine-tune settings.
  • Use the diagnostic and annotation features to visualize regions, cells, and filtering effects.
  • Validate the profile to ensure all referenced resources are properly configured.

Properties

NameTypeDescription
General
Synthesis Options
Iterative Processing
Results Filtering

Design Tabs

General View or edit properties of a node.
Reports View reports for a node.
Tester Test the OCR Profile and view diagnostics from the OCR process.
Advanced View or edit advanced details about a node.

See Also

Used By

Notification