Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Row Match

Table Extract Method Grooper.Extract

Extracts tabular data by matching entire rows using a 2D extraction approach.

Remarks

The Row Match class provides a flexible method for extracting tables from documents where traditional column-based extraction is not practical or where headers are absent. It is especially useful for reading tables with irregular layouts, no headers, or when the table structure is simple and consistent.

Row Match works by applying a single row-level extractor to the document content. Each match from this extractor is treated as a table row, and cell values are populated based on named groups or child extractors that correspond to the table's columns. This approach allows for both simple and complex table structures to be extracted, including those with variable row formats or without explicit column headers.

How It Works

  • The user configures a 'Row Extractor' that defines how to identify and parse each table row. This can be a regular expression with named groups matching column names, or a Data Type with child extractors for each column.
  • Optionally, header and footer extractors can be set to define the start and end of the table region. These can also be specified using Label Sets for more advanced scenarios.
  • Additional properties allow for filtering rows based on vertical spacing, alignment, and other criteria to ensure only valid table rows are included.

Row Match is label set-aware, meaning it can leverage header and footer labels to further refine table boundaries. This makes it suitable for a wide range of tabular extraction tasks, from simple lists to complex, unstructured tables.

Configuration Guidance

  • Use a regular expression with named groups (e.g., (?<Quantity>\d+) (?<Unit_Price>[\d,.]+) (?<Total>[\d,.]+)) to map extracted values to columns.
  • For tables with more complex row structures, consider using a Data Type with child extractors and 2D collation methods.
  • Adjust header, footer, and spacing properties to fine-tune which rows are included, especially in documents with variable layouts.

Row Match is ideal for scenarios where table structure is not strictly defined by headers or columns, or where rapid configuration is needed for simple tabular data.

Properties

NameTypeDescription

See Also

Used By

Notification