Grooper Help - Version 25.0
25.0.0024 2,166

Geometric

Section Extract Method Grooper.Extract

Defines a rectangular region using anchors and extracts the character data bounded by the region as a Section Instance.

Remarks

The Geometric section extraction method enables precise extraction of content from a document by defining a rectangular region using a combination of extractors and edge adjustments. This method is ideal for scenarios where the content to be extracted is consistently located within a specific area of the page, such as form fields, table regions, or boxed sections.

Extraction proceeds as follows:

  1. The 'Main Extractor' property establishes the initial region for each Section Instance.
  2. Each edge of the region can be further adjusted using the 'Left Adjustment', 'Top Adjustment', 'Right Adjustment', and 'Bottom Adjustment' properties. These adjustments can be based on text anchors, page boundaries, or manual rules.
  3. Optionally, the region can be expanded to nearby detected lines using the 'Line Detect Limit' property, which helps snap the region to visible lines or borders in the document image.

> Note: This method cannot be used when the section content spans multiple pages. Only single-page regions are supported; > any results that span more than one page will be discarded.

Configuration Guidance

  • Use the 'Main Extractor' to define the primary region of interest. This is typically a value extractor that identifies a bounding box or anchor point.
  • Apply edge adjustments to fine-tune the region, especially if the content is not perfectly aligned or if dynamic anchors are needed.
  • Enable line detection to automatically expand the region to the nearest detected lines, which is useful for forms or documents with visible borders.

This method is commonly used for extracting structured data from forms, invoices, or other documents where the layout is predictable and regions can be defined geometrically.

Properties

NameTypeDescription

See Also

Used By

Notification