Grooper Help - Version 25.0
25.0.0024 2,166

Grid Layout

Table Extract Method Grooper.Extract

Extracts tables which have both row and column headers, by inferring a grid from the header positions.

Remarks

The Grid Layout extract method is designed for extracting tabular data from documents where both row and column headers are present, and the table structure can be inferred from the positions of these headers. This method is especially useful for financial statements, cross-tab reports, or any layout where data is organized in a matrix with labeled axes.

How It Works

  • Configure the X Axis Extractor to match the column headers (typically at the top of the table).
  • Configure the Y Axis Extractor to match the row headers (typically on the left side of the table).
  • Specify the 'Header Column' to indicate which Data Column should receive the row header values.
  • The extractor will infer a grid by intersecting the detected header positions, mapping each cell in the document to the corresponding row and column in your Data Table.

This approach allows for robust extraction even when the table is transposed (i.e., rows and columns are swapped). Enabling the 'Transpose' option will correctly map data to the output table in such cases.

Example 1: Standard Table Layout

Input Table

RevenueExpensesProfit
January10,00011,00012,000
February6,0006,0006,000
March4,0005,0006,000
Output Data
MonthRevenueExpensesProfit
January10,00011,00012,000
February6,0006,0006,000
March4,0005,0006,000

Example 2: Transposed Table Layout

If the data is transposed on the document, enabling the 'Transpose' option will correctly map data to the output table.

Input Table

JanuaryFebruaryMarch
Revenue10,0006,0004,000
Expenses11,0006,0005,000
Profit12,0006,0006,000
Output Data
MonthRevenueExpensesProfit
January10,00011,00012,000
February6,0006,0006,000
March4,0005,0006,000


Grid Layout Options Extension

The Grid Layout Options extension allows you to configure extraction settings for each Data Column in a Data Table that uses the Grid Layout extract method. This extension provides fine-grained control over how cell values are extracted on a per-column basis, enabling you to override the default extraction method (such as switching between text, OCR, or OMR extraction) and to specify whether a column is required for a row to be included in the output.

Use Grid Layout Options to:

  • Set the extraction method for a column using the 'Read Method' property (e.g., extract from text, force OCR, or use OMR for checkboxes).
  • Mark columns as required using the 'Required' property, so that only rows with data in those columns are extracted.

Additional Notes

  • This table extraction method does not support cases where table data spans multiple pages.
  • See the documentation for the 'X Axis Extractor' and 'Y Axis Extractor' properties for guidance on constructing header extractors.
  • If multiple instances of the table are found on the document, all rows from all instances will be combined into a single output table.
  • To filter out empty rows, mark one or more Data Column objects as primary columns. Rows with an empty value in any of the primary columns will be discarded.
  • This extraction mechanism depends on Layout Data generated by the Line Detection / Line Removal IP commands during Image Processing or Recognize. Ensure that input documents have been processed accordingly before using this method.

Properties

NameTypeDescription
General
Snap To Lines
Cell Extraction

See Also

Used By

Notification