Grooper Help - Version 25.0
25.0.0023 2,165
  • Overview
  • Help Status

AI Transaction Detection

Section Extract Method Grooper.GPT

Identifies the boundaries between transactions in a document and optionally extracts structured data from each transaction using generative AI.

Remarks

The AI Transaction Detection extract method enables Grooper to automatically segment documents into individual transactions — such as payroll reports, EOBs, or other document types which contain repeating data structures. It works by detecting consistent features (anchors) that mark the start of each transaction. It can also extract structured data from each detected transaction using generative AI, supporting both simple and highly complex document layouts.

Overview

This method is designed for documents containing multiple, similarly structured transactions that may not be separated by explicit page breaks or static delimiters. By configuring anchor detection and extraction options, users can automate the identification and extraction of transaction data with minimal manual setup.

Boundary Detection

The core of this method is anchor-based boundary detection. Anchors are features—such as static text labels (e.g., Employee:), regular expressions for structured values (e.g., dates, IDs), or other repeatable patterns—that reliably indicate the start of a transaction. The method scans the document for these anchors and uses their positions to segment the content into discrete transaction sections.

  • Anchor Types:
    The LLM may generate two types of anchors:

    • Static text labels (preferred for reliability)
    • Regular expressions for numeric or date values
  • Anchor Configuration:
    Anchors can be defined manually or inferred from sample data. Each anchor includes a value (string or regex), a flag indicating if it is a regular expression, and a line offset specifying its position relative to the transaction start.

  • Boundary Identification:
    When anchors are detected, the method creates a new transaction section at each anchor position. This enables Grooper to process documents with repeating records, even if the layout varies or explicit separators are absent.

Data Extraction

In addition to boundary detection, this method can extract structured data from each detected transaction using generative AI. Extraction is performed according to the configuration of included Data Elements, and can be customized for specific fields, tables, or sections.

  • Extraction Workflow:

    • For each detected transaction, a quote message is generated to provide context to the AI model.
    • Extraction instructions can be customized to guide the AI for specialized or non-standard layouts.
    • Data extraction is performed in parallel for efficiency, with options to control batch size and concurrency.
  • Included Elements:
    Users can limit extraction to specific fields, tables, or sections by configuring the 'Included Elements' property. This reduces prompt complexity and focuses the AI on only the required data.

  • Parallelism and Performance:
    The 'Max Degree of Parallelism' and 'Transactions Per Operation' properties allow users to balance extraction throughput and resource usage.

Diagnostics and Logging

The AI Transaction Detection method generates detailed diagnostic information throughout both boundary detection and data extraction. These diagnostics are essential for configuration, troubleshooting, and auditing, and are accessible through the Grooper diagnostic interface.

  • Boundary Detection Diagnostics:

    • Logs the process of anchor detection, including which anchors were matched, their positions, and any issues encountered.
    • Records the number of detected transactions and the boundaries identified within the document.
    • Captures details about anchor configuration, detection attempts, and fallback logic if anchors are missing or ambiguous.
  • Data Extraction Diagnostics:

    • Logs each extraction operation, including the content of prompt messages sent to the AI model and the responses received.
    • Tracks the mapping of extracted data back to each transaction section.
    • Records timing and performance metrics for extraction and data import steps.
    • Captures errors, warnings, or alignment issues encountered during extraction or data import.

Tip:
After configuring boundary detection and extraction, review the diagnostic logs to ensure anchors are correctly matched, transactions are properly segmented, and data extraction is accurate. Use these logs to refine anchor selection, extraction instructions, and performance settings.

Properties

NameTypeDescription

See Also

Used By

Notification