Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Line Periodicity Detector

Embedded Object Grooper.Extract

Detects repeating line-based patterns in document content, enabling the identification of periodic sections such as transactions or repeating data blocks.

Remarks

The Line Periodicity Detector is used to analyze a sequence of lines within a document and identify repeating patterns based on configurable similarity and alignment criteria. This is especially useful for extracting structured data from documents where information is organized in rows or sections that repeat.

How It Works

The detector compares lines of text to find groups that are sufficiently similar, as determined by the 'Similarity Threshold' and 'Minimum Alignment' properties. When a repeating pattern is detected, the system can treat each occurrence as a distinct section or transaction, enabling downstream extraction and processing.

Configuration Guidance

  • Adjust the 'Similarity Threshold' to control how closely lines must match to be considered part of the same repeating pattern.
  • Use 'Minimum Alignment' and 'Minimum Segment Length' to fine-tune the sensitivity of pattern detection, especially in documents with variable content or formatting.
  • Optionally, assign a 'Background Extractor' to filter out known background elements that should not influence periodicity analysis.
  • The 'Page Depth' property limits the number of pages analyzed, which can improve performance on large documents.

Only the first 5 pages of the document will be analyzed for periodicity.

Notes

  • The Line Periodicity Detector is typically used as part of a section extraction or transaction detection workflow.
  • Proper configuration is essential for accurate detection; test with representative samples to ensure optimal results.
  • For more information, see the documentation for Section Extract Methods and Value Extractors.

Properties

NameTypeDescription

See Also

Used By

Notification