Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

AI Table Reader

Table Extract Method Grooper.GPT

Uses generative AI to extract tabular data from document content.

Remarks

The AI Table Reader enables advanced extraction of tabular data from documents using generative AI, powered by large language models (LLMs) such as OpenAI's GPT series. It is designed to interpret semi-structured or unstructured content and transform it into structured table instances, even when the table layout is ambiguous or not explicitly formatted.

How It Works

The extraction workflow consists of several coordinated steps:

  1. Prompt Construction:
    The reader selects and formats the relevant document content using the configured quoting method. A system prompt, extraction schema (defining the expected table structure), and any optional instructions are included to guide the LLM's response. You may filter which columns are included and specify row alignment strategies.

  2. LLM Completion:
    The constructed prompt is sent to the selected LLM model. If structured output mode is enabled, the LLM is instructed to return a strict JSON object matching the schema. Otherwise, the schema is provided as guidance and the LLM returns a best-effort JSON response. The model analyzes the quoted content and generates a JSON object containing the extracted table data.

  3. Data Mapping and Row Alignment:
    The returned JSON is parsed and mapped to the corresponding Data Columns and rows in the table. The row alignment mode determines how extracted rows are associated with the original document content, supporting features such as geometric alignment, header/footer detection, and multiline row handling.

  4. Diagnostics and Logging:
    Throughout the extraction process, detailed diagnostics are generated to support troubleshooting, validation, and prompt engineering.

Diagnostics and Logging

The following diagnostic artifacts are produced and can be reviewed or exported using Grooper's diagnostic tools:

  • Chat Log.json: The complete chat conversation, including all prompts and responses.
  • Response Data.json: The raw JSON response returned by the LLM.
  • Operation Log Entries: Chronological logs of key steps and events.
  • Error Messages: Details of any errors encountered during extraction or data mapping.
  • Performance Timers: Timing data for LLM completion and other critical operations.

These diagnostics provide transparency into the extraction process, making it easier to diagnose issues, refine prompts, and ensure reliable results.

Usage Scenarios

  • Extracting line items, transaction logs, or repeating records from semi-structured or unstructured documents.
  • Handling tables with variable layouts, ambiguous content, or inconsistent formatting.
  • Rapidly adapting extraction logic to new table types using prompt engineering and schema configuration.

LLM Connector Requirement

This extractor requires a properly-configured LLM Connector on the repository Root to communicate with the LLM service. Ensure the connector is set up in your environment.

Properties

NameTypeDescription
General
Options

See Also

Used By

Notification