Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Table Header Detector

Embedded Object Grooper.Extract

Detects table headers in tabular data using extractors, label sets, and layout analysis.

Remarks

The Table Header Detector is responsible for identifying header rows in tabular data during extraction with the Tabular Layout method. It supports multiple header detection strategies, including value extractors, label sets, and column header extractors, to accommodate a wide range of table formats and document layouts.

Header detection is a critical step in tabular extraction, as it establishes the structure and boundaries of the table, enabling accurate mapping of data columns and rows. The detector can handle single-line and multi-line headers, merged or stacked header cells, and variable header positions across pages or table regions.

Configuration Guidance

  • Choose a detection strategy that matches your document layout: use a value extractor for pattern-based headers, or leverage label sets for label-driven extraction.
  • Adjust the 'Minimum Cell Count' and 'Maximum Line Spacing' properties to fine-tune header recognition for your specific table format.
  • Use the 'Repair Threshold' to control how aggressively the detector attempts to fill in missing or incomplete header cells.
  • Set 'Run Global' to true if headers are consistent across the entire document, or false to detect headers separately for each table region.

Best Practices

  • Test header detection on a variety of sample documents to ensure robust extraction.
  • Avoid overlapping or ambiguous header labels, as these can reduce detection accuracy.
  • Use diagnostic output to review detected headers and adjust configuration as needed.

For more information, see the documentation for Tabular Layout, Data Table, and Label Set.

Properties

NameTypeDescription

See Also

Used By

Notification