Grooper Help - Version 25.0
25.0.0023 2,165
  • Overview
  • Help Status

Transaction Extractor

Embedded Object Grooper.GPT

When enabled, extracts structured data from each transaction using generative AI.

Remarks

Transaction Extractor provides AI powered data extraction for individual transactions detected by AI Transaction Detection.

This object configures and orchestrates the extraction workflow, including document quoting, prompt construction, batching, parallel execution, and data import. It is highly configurable, allowing users to tailor extraction behavior for different document types and business requirements.

Role in Grooper

  • Used as part of AI Transaction Detection to automate the extraction of structured data from unstructured or semi-structured documents.
  • Integrates with Data Model schemas, allowing you to target specific fields, tables, or sections for extraction.
  • Supports custom instructions and quoting methods to optimize LLM understanding and output accuracy.
  • Handles batching and parallelism to maximize throughput and efficiency for large or complex documents.

How It Works

  1. Document Quoting:
    The extractor formats and annotates document content using the configured quoting method, providing the LLM with clear context for each transaction.

  2. Prompt Construction:
    For each batch of transactions, a prompt is built that includes the quoted content, extraction schema, and any user-supplied instructions.

  3. Batching and Parallelism:
    Transactions are grouped into batches according to the 'Transactions Per Operation' property. Multiple extraction operations can run in parallel, controlled by 'Max Degree of Parallelism', to improve performance.

  4. LLM Extraction:
    The LLM receives the prompt and returns structured data for each transaction, matching the configured schema and instructions.

  5. Data Import:
    Extracted data is mapped back to the document, aligning each transaction with its source content for review, validation, and downstream processing.

Configuration and Usage

  • Select a quoting method that best preserves the structure and meaning of your document content.
  • Use 'Extraction Instructions' to clarify requirements, handle edge cases, or provide document-specific guidance.
  • Limit extraction to specific Data Elements using the 'Included Elements' property to improve accuracy and performance.
  • Adjust batching and parallelism settings to balance throughput and resource usage for your environment.

Diagnostics and Logging

The Transaction Extractor generates detailed diagnostic artifacts to support configuration, troubleshooting, and validation:

  • Extraction Logs:
    Chronological logs of each extraction operation, including prompts, responses, and timing metrics.
  • Response Data.json:
    The raw JSON data returned by the LLM for each batch.
  • Chat Log.jsonl:
    The full chat conversation with the LLM, including all prompts and responses.
  • Data Import Timers:
    Timing information for data import and mapping operations.

These diagnostics are accessible through the Grooper diagnostic interface and are essential for refining extraction settings, troubleshooting issues, and validating results.

Best Practices

  • Start with default settings and refine quoting, instructions, and batching based on extraction diagnostics.
  • Use smaller batches for documents with highly variable or complex transactions.
  • Review diagnostic logs to troubleshoot extraction issues and optimize prompt design.
  • Monitor system performance and adjust parallelism to avoid overloading resources.

For more information, see the documentation for AI Transaction Detection, Data Model, and Quoting Method.

Properties

NameTypeDescription

See Also

Used By

Notification