Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Transaction Detection

Section Extract Method Grooper.Extract

Detects periodic transactions in a document and generates a Section Instance for each transaction.

Remarks

The Transaction Detection section extractor is designed to split a document into a sequence of transactions, with each transaction represented as a Section Instance. This is commonly used to break up documents such as EOBs (Explanation of Benefits) into individual claims, enabling further extraction and processing of each claim as a distinct unit.

The extractor supports multiple strategies for identifying transaction boundaries, allowing it to adapt to a wide range of document layouts and transaction formats.

How Transaction Detection Works

The extractor uses two primary methods to detect transaction boundaries:

  • Standard Method (Periodicity Detection):
    This method analyzes the document for repeating groups of text lines, looking for patterns that indicate the presence of periodic transactions. The goal is to find the most likely way to split the document so that each group contains exactly one instance of each binding value. Binding fields play a critical role in this process, as their values are used to select the optimal grouping. For example, if a document contains a repeating set of claim numbers, the extractor will attempt to ensure that each Section Instance contains exactly one unique claim number.

  • Label-Assisted Method:
    If a Label Set is defined for the document's Content Type, and it includes header or footer labels for the relevant Data Section, these labels are used to determine where Section Instances start and end. The logic adapts based on whether only a header, only a footer, or both are present. For example, if only a header label is defined, each occurrence of the header marks the start of a new section, extending to the next header or the end of the document.

The extractor can also handle cases where transactions span multiple pages, combining sections when binding values are repeated across page breaks and the 'Detect Page Spanning' property is enabled.

Configuration Guidance

To use the Transaction Detection extractor effectively:

  • Assign the extractor to a Data Section that represents a transaction or repeating group in your document.
  • Configure the 'Binding Fields' property to specify which Data Fields should be used as unique identifiers for each transaction. These fields are critical for accurate sectioning, especially when multiple possible groupings are detected.
  • Adjust the 'Minimum Line Count' and 'Minimum Confidence' properties to control the sensitivity and quality threshold for detected sections.
  • If your documents use visual separators (such as horizontal rules) or have complex layouts, configure the 'Horizontal Rule Detection' and 'Periodicity Detection' properties as needed.
  • If your Content Type includes a Label Set with header or footer labels, ensure these are correctly defined to enable label-assisted detection.

Example Scenario

Suppose you have an EOB document where each claim starts with a line labeled "Claim Number:" and ends with a horizontal rule. You would configure the Data Section for claims, assign the Transaction Detection extractor, set the 'Binding Fields' to include the "Claim Number" field, and (optionally) define header/footer labels in the Label Set. The extractor will then split the document into Section Instances, each containing a single claim.

Additional Notes

  • The extractor is robust to variations in document structure, but accurate configuration of binding fields and labels is essential for best results.
  • When 'Detect Page Spanning' is enabled, transactions that are split across pages (with repeated binding values) will be merged into a single section.
  • The extractor produces detailed diagnostic output (when enabled) to assist with troubleshooting and optimization.

Properties

NameTypeDescription

See Also

Used By

Notification