Grooper Help - Version 25.0
25.0.0024 2,166

Boundary Detector

Embedded Object Grooper.GPT

Detects transaction boundaries in a document by identifying anchor features using generative AI.

Remarks

Boundary Detector automatically locates the start of each transaction or repeating section within a document by analyzing its content for consistent anchor features. It is a core component of AI Transaction Detection, providing the intelligence and configuration needed to segment documents into meaningful, extractable units.

Overview

The Boundary Detector is designed for documents containing multiple, similarly structured transactions—such as payroll records, EOBs, or repeating form entries—where each transaction begins with a recognizable pattern or set of features. By configuring anchor detection parameters, users can automate the identification of transaction boundaries, even in documents with complex or variable layouts.

How It Works

  1. Detection Window Selection:
    The detector selects a portion of document content to analyze, based on the configured detection window size. This window should be large enough to expose at least 10 repeating transactions to the LLM during boundary detection.

  2. Anchor Detection via LLM:
    The detector constructs a prompt for the large language model (LLM), including a job description, schema, and any user-supplied instructions. The LLM is asked to identify a set of anchor features—such as static text labels, regular expressions, or structured field patterns—that reliably indicate the start of each transaction. The number of anchors and detection attempts are configurable for robustness.

  3. Anchor Matching and Scoring:
    The detected anchors are matched against each line of the document. Each line is scored based on how many anchors it matches, and a minimum score threshold determines which lines are considered transaction boundaries. The detector repairs and adjusts hit counts to account for ambiguous or overlapping matches, ensuring accurate segmentation.

  4. Section Instance Creation:
    For each detected boundary, a new section instance is created, representing a single transaction. These instances are returned for further processing or data extraction.

  5. Iterative Detection and Fallback:
    If no valid boundaries are found, the detector can retry anchor detection with alternative strategies or updated instructions, up to the configured maximum number of attempts. Previous unsuccessful anchor sets are provided to the LLM to improve results.

Diagnostics and Logging

The Boundary Detector generates detailed diagnostic artifacts to support configuration, troubleshooting, and validation:

  • Anchor Detection Logs:
    Chronological logs of each detection attempt, including anchor details, line offsets, regex status, reasons, and hit counts.

  • Anchor Detection.txt:
    A text file showing line-by-line anchor match scores and boundary decisions for the analyzed content.

  • Response Schema.json:
    The JSON schema used for anchor detection responses, aiding in prompt engineering and debugging.

  • Timing Metrics:
    Timers for anchor matching operations, supporting performance analysis.

  • Hierarchical Diagnostics:
    Child logs for each detection attempt, allowing users to review and compare multiple strategies and outcomes.

These diagnostics are accessible through the Grooper diagnostic interface and are essential for refining anchor selection, validating boundary detection, and optimizing performance.

Best Practices

  • Use clear, consistent field labels in your documents to maximize anchor detection reliability.
  • Adjust the minimum score and anchor count to balance sensitivity and specificity for your document type.
  • Provide targeted instructions to the LLM if initial detection attempts are unsuccessful or produce ambiguous results.
  • Review diagnostic logs to understand how anchors are matched and to troubleshoot segmentation issues.

For more information on configuring and using the Boundary Detector, refer to the Grooper documentation on AI Transaction Detection and Anchor-Based Segmentation.

Properties

NameTypeDescription

See Also

Used By

Notification