Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

AI Extract

Data Fill Method Grooper.GPT

Populates descendant data elements by asking an AI chatbot to read them from the document content.

Remarks

The AI Extract fill method leverages a large language model (LLM) to extract structured data from document content, using the schema defined by descendant Data Elements such as Data Fields, Data Sections, and Data Tables.

This approach enables highly flexible, context-aware extraction that adapts to a wide range of document layouts and language patterns, making it ideal for unstructured or semi-structured documents where traditional extractors may fall short.

How It Works

  • Prompt Construction: AI Extract builds a prompt for the LLM that includes:
    • Optional user-provided instructions (via the 'Instructions' property) to guide extraction.
    • A schema representing the target Data Elements and their structure.
    • The 'Description' property of each Data Element is is provided to the LLM as part of the schema. Use data element descriptions to provide instructions which are specific to a data element.
    • Quoted content from the document, as determined by the 'Document Quoting' property.
  • LLM Interaction: The prompt is sent to the selected LLM model (configured via the 'Model' and 'Parameters' properties). The model responds with JSON data matching the schema.
  • Data Ingestion: The returned JSON is parsed and used to populate the corresponding Data Fields, Data Sections, and Data Tables in the Grooper document.

Alignment Strategies

AIExtract supports several alignment modes to map extracted values back to their location in the original document:

  • Quoted: Aligns values to their OCR-extracted text.
  • Labeled: Associates values with label-like context found in the document.
  • Labeled and Quoted: Combines label context and quoted values for more robust alignment.
  • Geometric: Uses page number and bounding box information to visually locate values.

Alignment is especially useful for document review, allowing users to see where each extracted value appears on the page. For automated workflows where alignment is not needed, these options can be disabled for improved performance.

Configuration Guidance

  • Model: Select the LLM model to use for extraction. This determines the language understanding and output capabilities.
  • Parameters: Fine-tune model behavior (e.g., temperature, max tokens) for your use case.
  • Instructions: Provide custom instructions to influence how the LLM interprets and extracts data (e.g., "Extract all invoice totals and dates").
  • Document Quoting: Choose how much and what part of the document content is included in the prompt. This can affect both accuracy and cost.
  • Included Elements: Restrict extraction to specific Data Elements. If left empty, all visible, non-computed descendants are included.
  • Use Structured Output: Enable for models that support OpenAI's structured JSON output mode. Disable for legacy or non-OpenAI models.

Advanced Options

Use Cases

  • Unstructured Document Extraction: Extract data from contracts, correspondence, or other documents with variable layouts.
  • AI-Assisted Data Entry: Automate data population for fields that are difficult to extract with rules or regular expressions.
  • Review and Validation: Provide reviewers with context-aware, aligned data for efficient validation and correction.

Notes

  • The quality of extraction depends on the LLM model, prompt design, and the clarity of the schema.
  • For best results, provide clear instructions and include only the necessary document content in the prompt.
  • Alignment options may increase processing time; disable them if not required for your workflow.

Properties

NameTypeDescription
General
Options

See Also

Notification