Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Table Row Detector

Embedded Object Grooper.Extract

Detects table rows in tabular data using value extractors and layout analysis.

Remarks

The Table Row Detector is a core component of the Tabular Layout extraction method, responsible for identifying and segmenting table rows within a document. It works in conjunction with the Table Header Detector and column configuration to accurately map document content into structured rows and columns.

How Row Detection Works

Row detection operates by analyzing the document's text lines and applying value extractors to locate column values. For a line to be considered a valid table row:

  • It must contain values for all columns marked as required in Tabular Layout Options.
  • It must meet or exceed the 'Minimum Cell Count' property, ensuring a minimum number of columns are present.
  • The horizontal spacing between detected values must not exceed the 'Maximum Gap' property.
  • The first row must appear within the distance specified by 'Maximum Header Distance' from the header.

The detector supports a wide range of table layouts, including:

  • Tables with variable or missing columns.
  • Multi-line rows (when used with Multiline Row Settings).
  • Tables split across multiple regions or pages.

Configuration Guidance

  • Minimum Cell Count:
    Increase this value to require more columns for a row to be detected, reducing false positives in noisy documents. Decrease for sparse or irregular tables.

  • Maximum Gap:
    Adjust to control how far apart column values can be and still be grouped as a single row. Use higher values for wide tables or those with inconsistent spacing.

  • Maximum Header Distance:
    Set to allow for extra space or blank lines between the header and the first row. Increase for documents with decorative or whitespace lines after the header.

  • Find Column Positions:
    Enable to dynamically adjust header cell boundaries based on detected values, improving alignment in documents with shifting or misaligned columns.

  • Merge Multiple Instances:
    Use to combine table regions that are separated by non-row content or page breaks into a single logical table.

Example Scenarios

  • Invoice Line Items:
    Detects each line item as a row, even if some columns are occasionally missing (if not required).

  • Multi-Page Tables:
    With 'Merge Multiple Instances' enabled, a table that continues across pages will be treated as a single table.

  • Tables with Blank Lines:
    Increase 'Maximum Header Distance' to allow for blank or decorative lines between the header and first row.

Troubleshooting & Best Practices

  • If extra or partial lines are detected as rows, increase 'Minimum Cell Count' or mark more columns as required.
  • If rows are missed due to wide spacing, increase 'Maximum Gap'.
  • Use diagnostic output to review which lines are detected as rows and adjust settings accordingly.
  • Test with a variety of sample documents to ensure robust extraction across different layouts.

Related Concepts

For more information, see the documentation for Tabular Layout, Data Table, Table Header Detector, and Tabular Layout Options.

Properties

NameTypeDescription

Used By

Notification