Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Table Extract Method

Embedded Object Grooper.Core

Defines an extraction method for use with Data Table objects, enabling flexible capture of tabular data from documents.

Remarks

The Table Extract Method class serves as the foundation for all table extraction strategies in Grooper. It provides the core interface and extensibility points for capturing repeating data arranged in rows and columns, supporting a wide range of table layouts and document types.

Extraction methods derived from this class determine how table rows and columns are detected, how values are mapped to cells, and how special cases such as headers, footers, and multi-line rows are handled. Each method is designed to address specific document scenarios, from highly structured grid tables to loosely formatted lists or pattern-based data.

How Table Extraction Works

  1. The configured Table Extract Method is used to locate and segment rows within the document.
  2. Each row is analyzed to extract values for the defined columns, using value extractors, pattern matching, or layout analysis.
  3. Additional logic may be applied to handle headers, footers, multi-line rows, or special formatting.
  4. The resulting Table Instance contains Table Row Instance objects, each populated with cell values for the configured columns.

Configuration Guidance

  • Choose the extraction method that best matches your document's table layout. For example, use Tabular Layout for grid-based tables, Row Match for pattern-based rows, or Delimited Extract for CSV/TSV data.
  • Configure column definitions and value extractors to match the expected data in each row.
  • Use method-specific options to fine-tune detection of headers, footers, and multi-line content.
  • Test extraction on a variety of sample documents to ensure robust and accurate results.

Best Practices

  • Use diagnostic output to review detected rows, columns, and cell values for tuning and troubleshooting.
  • Leverage per-column options (such as required columns or custom extractors) to improve data completeness and accuracy.
  • Combine multiple extraction methods in a single solution to handle diverse table formats within a document set.

For more information, see the documentation for Data Table, Tabular Layout, Row Match, and related extraction objects.

Derived Types

There are 7 implementations of Table Extract Method.

AI Table Reader Uses generative AI to extract tabular data from document content.
Delimited Extract Extracts table data from delimited (CSV) files using configurable mapping and parsing options.
Fixed Width Reads tabular data from a fixed-width text document using a predefined record layout.
Fluid Layout Extracts semi-tabular information that can alternate between tabular and flow layout from one document type to the next.
Grid Layout Extracts tables which have both row and column headers, by inferring a grid from the header positions.
Row Match Extracts tabular data by matching entire rows using a 2D extraction approach.
Tabular Layout Detects the layout of a table automatically using header labels and value extractors.

Used By

Notification