Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Layout Data

Grooper.IP

Defines a collection of information about the layout of a document page.

Remarks

The Layout Data class encapsulates all detected structural features of a document page, providing a unified, serializable representation of its layout. This object is central to Grooper's document analysis pipeline, serving as the primary container for lines, boxes, barcodes, shapes, and other layout elements extracted from an image. It is used by downstream processes such as data extraction, classification, and validation, enabling advanced document understanding and automation.

Layout Data Components

The following types of layout data are stored:

  • Horizontal Lines Collection of all detected horizontal line segments on the page. Used for table, row, and region detection.

  • Vertical Lines Collection of all detected vertical line segments. Supports column, boundary, and region analysis.

  • OMR Boxes List of detected OMR (Optical Mark Recognition) boxes, representing checkboxes, bubbles, or other markable regions.

  • Barcodes List of detected barcodes, including their symbology, value, and position. Supports data extraction and routing.

  • Shapes List of detected geometric shapes, such as circles, rectangles, or custom templates. Used for advanced region analysis.

  • Underlines Subset of horizontal lines that are not connected to any vertical lines, typically representing underlines or separators.

How Layout Data Is Generated

Layout data is generated and saved during execution of the Image Processing and Recognize activities, when the configured IP Profile contains IP Commands which detect layout data. It is constructed by aggregating the results of various layout detection commands (such as Line Detection, Box Detection, Barcode Detection, and Shape Detection, or their dropout variations such as Line Removal, Box Removal, Barcode Removal, and Shape Removal).

Each detection command contributes its results which are merged into the existing Page Layout. The layout data is thus a composite view of all detected elements.

How Layout Data Is Used

Layout data is essential for certain downstream data extraction activities in Grooper. The following describes some examples of how layout data is used throughout the platform:

  • Table Extraction and Adjustment:
    Table Extract Methods and related components (such as Table Row Adjuster and Table Header Detector) use layout data to precisely align and segment table rows and columns. For example, horizontal and vertical lines from the layout are used to snap row and header boundaries, detect dividing lines between cells, and resolve ambiguous table structures.

  • Text Preprocessing and Tab Detection:
    The Horizontal Tab Marker leverages layout data to insert tab characters in extracted text where vertical lines or underlines are present. This enables reliable column separation in tabular data and fill-in-the-blank forms. When the 'Vertical Lines' detection option is enabled, the marker queries the layout for vertical lines between adjacent words, inserting a tab if a dividing line is found. When the 'Underline' option is enabled, the marker checks for underlines beneath whitespace gaps using the layout’s underlines, suppressing tab insertion for fill-in regions.

  • Paragraph and Region Segmentation:
    The Paragraph Marker and its supporting logic use layout data to analyze line structure and segment text into paragraphs. Layout data helps determine paragraph starts, line wrapping, and the presence of horizontal rules or indents by referencing detected lines and their positions. For example, a horizontal line between two text lines may indicate a paragraph break.

  • Region Snapping and Field Alignment:
    Line Snap Options use layout data to automatically adjust the edges of rectangular regions (such as fields or zones) so they align with detected lines. For each edge, the nearest line is found using methods like GetLineWest, GetLineNorth, GetLineEast, and GetLineSouth, ensuring that extracted regions match the printed structure of the page.

  • AI and LLM Integration:
    The Layout Objects quoting method serializes layout data—including lines, barcodes, checkboxes, and text regions—into a structured format for use by AI models. This enables advanced workflows such as layout analysis, table extraction, and visual question answering, where the AI requires both content and spatial context.

  • Barcode Value Extraction:
    The Find Barcode value extractor uses layout data to locate and extract barcode values that were previously detected during image processing. Instead of reading barcodes directly from the image at extraction time, it searches the layout data for barcodes matching the configured symbology and, if specified, within a defined region of interest. This enables fast and reliable extraction of barcode values for indexing, validation, or routing, leveraging the results of prior Barcode Detection or Barcode Removal commands.

Notification