Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

AI Section Reader

Section Extract Method Grooper.GPT

Extracts a Section Instance from a document using generative AI.

Remarks

The AI Section Reader provides advanced extraction of single-instance Data Sections from documents using generative AI, leveraging large language models (LLMs) such as OpenAI's GPT series. It is designed to handle complex, variable, or ambiguous document layouts where traditional extraction methods may be insufficient.

How It Works

The extraction workflow consists of several coordinated steps:

  1. Prompt Construction:
    The reader selects and formats the relevant document content using the configured quoting method. A standard prefix and extraction schema (defining the expected JSON structure) are included in the prompt, along with any optional instructions to guide the LLM's response.

  2. LLM Completion:
    The constructed prompt is sent to the selected LLM model. If structured output mode is enabled, the LLM is instructed to return a strict JSON object matching the schema. Otherwise, the schema is provided as guidance and the LLM returns a best-effort JSON response. The model analyzes the quoted content and generates a JSON object containing the extracted values.

  3. Data Mapping and Alignment:
    The returned JSON is parsed and mapped to the corresponding Data Elements in the section. Each value is imported into its matching Data Field, Data Table, or Data Section instance. The alignment mode determines how the extracted section instance is associated with the original document structure, such as pages or regions.

  4. Diagnostics and Logging:
    Throughout the extraction process, detailed diagnostics are generated to support troubleshooting, validation, and prompt engineering.

Diagnostics and Logging

The following diagnostic artifacts are produced and can be reviewed or exported using Grooper's diagnostic tools:

  • Schema.json: The JSON schema provided to the LLM for each extraction operation.
  • Response Data.json: The raw JSON response returned by the LLM.
  • Chat Log.jsonl: The complete chat conversation, including all prompts and responses.
  • Operation Log Entries: Chronological logs of key steps and events.
  • Error Messages: Details of any errors encountered during extraction or data mapping.
  • Performance Timers: Timing data for LLM completion and other critical operations.

These diagnostics provide transparency into the extraction process, making it easier to diagnose issues, refine prompts, and ensure reliable results.

Configuration Guidance

  • Model Selection:
    Choose the LLM model that best fits your extraction needs and available resources. More capable models improve accuracy for complex or ambiguous documents.

  • Quoting Method:
    Select a quoting method that matches your document structure. Targeted quoting can improve extraction accuracy and reduce prompt size.

  • Instructions:
    Provide clear, concise instructions to guide the LLM, especially for documents with inconsistent layouts or special formatting requirements.

  • Structured Output:
    Enable structured output mode for complex or nested data to improve reliability and parsing.

  • Alignment Mode:
    Adjust alignment mode if you have specific requirements for mapping extracted data to document locations.

Usage Scenarios

  • Extracting key-value pairs, summary fields, or single records from semi-structured or unstructured documents.
  • Handling documents with variable layouts, ambiguous content, or inconsistent formatting.
  • Rapidly adapting extraction logic to new document types using prompt engineering and schema configuration.

LLM Connector Requirement

This extractor requires a properly-configured LLM Connector on the repository Root to communicate with the LLM service. Ensure the connector is set up in your environment.

Properties

NameTypeDescription
General
Options

Derived Types

There are 1 implementations of AI Section Reader.

AI Collection Reader Extracts a Section Instance Collection from a document using generative AI.

See Also

Used By

Notification