Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Read Zone

Value Extractor Grooper.Extract

Extracts text content from a specified rectangular region (zone) of a document.

Remarks

The Read Zone extractor enables zonal extraction by reading the content of a defined region on a document page. It supports both direct text extraction and OCR-based extraction, with options for image cleanup, orientation correction, and further value extraction using a nested Value Extractor.

Overview

Read Zone is designed for scenarios where the location of a value on a document is known or can be anchored to a label or pattern. It reads text from a rectangular region, which can be fixed or dynamically positioned using an anchor. Extraction can be performed directly from text or via OCR, with optional image preprocessing.

  • The 'Location' property defines the region to extract from. This can be a fixed rectangle or an anchored region that moves based on a detected label or pattern.
  • If an OCR Profile is specified, the region is processed using OCR. You can also apply an IP Profile for image cleanup before OCR, and use the 'Orientation' property to correct for rotated text.
  • The 'Value Extractor' property allows you to run another extractor (such as Pattern Match or List Match) on the extracted text, enabling targeted value extraction within the zone.
  • Output options include combining multiple values ('Value Separator'), customizing line breaks ('Line Separator'), and choosing whether to output the full region or just the text bounds ('Output Full Region').
  • When using anchored regions, you can exclude the anchor text from the output with 'Exclude Anchor'.

How It Works

  1. The extraction region is determined using the 'Location' property. This may be a static rectangle or dynamically anchored to a label.
  2. If OCR is enabled, the region is optionally cleaned up using an IP Profile, then processed with the selected OCR Profile.
  3. The extracted text is optionally passed to a nested Value Extractor for further processing, such as pattern matching or list lookup.
  4. Output formatting options allow you to control how multiple values and line breaks are represented.
  5. The final result includes the extracted value(s), confidence score, and region information.

Usage Scenarios

  • Fixed Zone Extraction:
    Extract values from known positions on structured forms, such as account number boxes or signature fields.
  • Anchored Zone Extraction:
    Define a region relative to a detected label (anchor), and extract the value next to it, optionally excluding the label text.
  • OCR with Preprocessing:
    Clean up a noisy image region with an IP Profile, then extract text using OCR and further process it with a Value Extractor.

Configuration Guidance

  • Accurately define the 'Location' region for reliable extraction. If using anchors, ensure the anchor pattern is robust.
  • Select an OCR Profile for image-based or scanned documents, and consider using an IP Profile to improve OCR accuracy.
  • Use a nested Value Extractor to target specific values or patterns within the zone's text.
  • Adjust output options to match your downstream requirements, such as combining multiple values or formatting line breaks.
  • Use 'Output Full Region' to visualize or debug the actual OCR area.

Diagnostics

When diagnostic logging is enabled, Read Zone records information about the extraction region, OCR/image processing steps, and output results. This can be used to troubleshoot configuration issues, validate region definition, and optimize extractor setup.

Diagnostic Artifacts

  • Region Image:
    An image of the extracted region is saved for each extraction, annotated with the region bounds.
  • Extraction Details:
    Diagnostic logs include information about the region definition, OCR/image processing steps, and the final output value(s).
  • Annotations:
    When anchors are used, the anchor region is highlighted in the diagnostic image.

These artifacts are accessible via the diagnostic interface and can be used to validate and tune your Read Zone configuration.

Properties

NameTypeDescription

See Also

Used By

Notification