Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Ask AI

Value Extractor Grooper.GPT.OpenAI.Chat

Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.

Remarks

Overview

The Ask AI extractor enables advanced natural language extraction from document content using generative AI. It sends a prompt—composed of user instructions and document context—to an LLM service, then parses the response into output values as Data Instance results. This allows for both simple and structured data extraction scenarios, including text, tables, and multi-field objects.

How It Works

The prompt sent to the LLM is built from the current data instance, optionally filtered by a context extractor and preprocessed for clarity (such as inserting tabs for table structure). The 'Instructions' property defines the user query or directive, which should be tailored to the extraction or transformation needed (e.g., "List the names of the parties" or "What is the effective date?").

The LLM’s response is parsed into output values. If 'Parse JSON Response' is enabled, the extractor expects the LLM to return JSON, which is then converted into a Data Instance hierarchy. This supports reliable extraction of structured information, such as tables or objects, and is compatible with downstream table and section extract methods. Optionally, a JSON Schema can be provided to guide the LLM to use a precise, schema-compliant structure.

Key Features

  • Supports both simple text and structured JSON extraction.
  • Allows preprocessing and context filtering to improve prompt quality.
  • Can extract multiple values or structured arrays from a single LLM response.
  • Supports JSONPath selectors for fine-grained control over extracted data.
  • Handles content that exceeds the model’s context length by truncating or chunking input.

Configuration Guidance

  • Use the 'Instructions' property to clearly specify the extraction goal.
  • Configure a context extractor to limit the prompt to relevant document sections.
  • Enable 'Parse JSON Response' and provide a JSON Schema for structured output.
  • Use the 'Selector' property to extract a specific portion of the JSON response.
  • Adjust 'Max Response Length' to control the size of the LLM’s output.

Example Scenarios

  • Extracting a single value: <code>What is the invoice number?</code>
  • Summarizing a document: <code>Summarize this lease in 3 bullet points.</code>
  • Structured data extraction: <code>Return a list of assignors as JSON.</code>

LLM Connector Requirement

This extractor requires a properly-configured LLM Connector on the repository Root to communicate with the LLM service. Ensure the connector is set up in your environment.

Diagnostics

The Ask AI extractor supports diagnostic logging to help users understand and troubleshoot the extraction process. When diagnostics are enabled, the following artifacts may be generated:

  • Chat Log: A file named Chat Log.jsonl captures the full conversation between the extractor and the LLM, which is useful for reviewing prompt effectiveness and model responses.
  • JSON Schema: When using a JSON schema, a file named JSON Schema.json is generated containing the schema provided to the LLM.
  • Response Data: When using a JSON schema, a file named Response Data.json is produced containing the formatted JSON returned by the LLM.
  • Context Extractor Diagnostics: If a 'Context Extractor' is configured, its diagnostic output is included as a child diagnostic artifact.

Diagnostic files can be reviewed in Grooper's diagnostic tools or exported for further analysis.

> Tip: Use diagnostics to refine prompt instructions, troubleshoot extraction issues, and validate the structure of LLM responses. Reviewing the chat log, schema, and response data can help identify problems with prompt formatting, context selection, schema design, or model output.

Properties

NameTypeDescription
General
Prompt
Response

Derived Types

There are 1 implementations of Ask AI.

AI Column Extractor Extracts structured content from documents with two-column layouts.

See Also

Used By

Notification