Ask AI

Inherits From Value Extractor Namespace Grooper.GPT.OpenAI.Chat

Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.

Remarks

Overview

The Ask AI extractor enables advanced natural language extraction from document content using generative AI. It sends a prompt—composed of user instructions and document context—to an LLM service, then parses the response into output values as Data Instance results. This allows for both simple and structured data extraction scenarios, including text, tables, and multi-field objects.

How It Works

The prompt sent to the LLM is built from the current data instance, optionally filtered by a context extractor and preprocessed for clarity (such as inserting tabs for table structure). The 'Instructions' property defines the user query or directive, which should be tailored to the extraction or transformation needed (e.g., "List the names of the parties" or "What is the effective date?").

The LLM’s response is parsed into output values. If 'Parse JSON Response' is enabled, the extractor expects the LLM to return JSON, which is then converted into a Data Instance hierarchy. This supports reliable extraction of structured information, such as tables or objects, and is compatible with downstream table and section extract methods. Optionally, a JSON Schema can be provided to guide the LLM to use a precise, schema-compliant structure.

Key Features

Supports both simple text and structured JSON extraction.
Allows preprocessing and context filtering to improve prompt quality.
Can extract multiple values or structured arrays from a single LLM response.
Supports JSONPath selectors for fine-grained control over extracted data.
Handles content that exceeds the model’s context length by truncating or chunking input.

Configuration Guidance

Use the 'Instructions' property to clearly specify the extraction goal.
Configure a context extractor to limit the prompt to relevant document sections.
Enable 'Parse JSON Response' and provide a JSON Schema for structured output.
Use the 'Selector' property to extract a specific portion of the JSON response.
Adjust 'Max Response Length' to control the size of the LLM’s output.

Example Scenarios

Extracting a single value: <code>What is the invoice number?</code>
Summarizing a document: <code>Summarize this lease in 3 bullet points.</code>
Structured data extraction: <code>Return a list of assignors as JSON.</code>

LLM Connector Requirement

This extractor requires a properly-configured LLM Connector on the repository Root to communicate with the LLM service. Ensure the connector is set up in your environment.

Diagnostics

The Ask AI extractor supports diagnostic logging to help users understand and troubleshoot the extraction process. When diagnostics are enabled, the following artifacts may be generated:

Chat Log: A file named Chat Log.jsonl captures the full conversation between the extractor and the LLM, which is useful for reviewing prompt effectiveness and model responses.
JSON Schema: When using a JSON schema, a file named JSON Schema.json is generated containing the schema provided to the LLM.
Response Data: When using a JSON schema, a file named Response Data.json is produced containing the formatted JSON returned by the LLM.
Context Extractor Diagnostics: If a 'Context Extractor' is configured, its diagnostic output is included as a child diagnostic artifact.

Diagnostic files can be reviewed in Grooper's diagnostic tools or exported for further analysis.

> Tip: Use diagnostics to refine prompt instructions, troubleshoot extraction issues, and validate the structure of LLM responses. Reviewing the chat log, schema, and response data can help identify problems with prompt formatting, context selection, schema design, or model output.

Properties

Name Type Description

General

Model

String

►

Specifies the large language model (LLM) to use for chat completions.

Parameters

Chat Parameters

►

Configures how large language model (LLM) completions behave.

Can be one of the following types:

Value	Description
Override

The Chat Parameters class defines a set of options for tuning the behavior of LLM completion requests in Grooper. These parameters allow users to control the randomness, diversity, and topicality of generated responses, enabling tailored outputs for a wide range of scenarios.

Typical usage involves adjusting one or more parameters to guide the model’s output toward the desired style or content. For example, increasing randomness for creative tasks, or reducing repetition for summarization.

Configuration Guidance

Adjust either 'Temperature' or 'Top P' to control randomness and diversity, but not both simultaneously.
Use 'Presence Penalty' and 'Frequency Penalty' together to reduce repetition and encourage broader topic coverage.

Usage in Grooper

Chat Parameters are used throughout Grooper to influence LLM completions for document processing, data extraction, and other AI-driven tasks.

Impact

Proper configuration of ChatParameters can significantly affect the quality, creativity, and reliability of LLM outputs. Experimentation is recommended to find the best settings for your specific use case.

Prompt

Instructions

String

►

The instructions or question to include in the prompt sent to the LLM.

Preprocessing

Text Preprocessor

►

Specifies preprocessing options to apply to document content before it is included in the prompt context.

Context Extractor

Value Extractor

►

An optional extractor which filters the document content included in the prompt.

Can be one of the following types:

Value	Description
Reference	Delegates extraction to another configured extractor, enabling reuse and centralization of extraction logic.
AI Column Extractor	Extracts structured content from documents with two-column layouts.
AI Schema Extractor	Extracts structured data from documents using a large language model (LLM) guided by a user-defined JSON schema.
Ask AI	Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.
Detect Signature	Detects a signature within a specified region of a document page by measuring the percentage of the area that is filled.
Entity Recognition	Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Field Match	Matches the value stored in a previously-extracted field or table column.
Find Barcode	Searches for barcode values in document Layout Data previously detected during image processing.
Highlight Zone	Defines a region of a document to be visually highlighted, without extracting any data values.
Key Phrase Extraction	Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Label Match	Matches a list of one or more label values, using matching options defined by a Labeling Behavior.
Labeled OMR	Reads a group of one or more checkboxes located nearby text labels.
Labeled Value	Extracts a field presented as a label-value pair within a document, associating labels and values based on their spatial relationship.
List Match	Extracts values from document text that match any entry in a list of search terms.
Ordered OMR	Reads one or more checkboxes with a consistent order of appearance inside a rectangular region.
Pattern Match	Extracts values from document text that match a specified regular expression pattern.
Pii Entity Recognition	Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.
Query HTML	Extracts values from an HTML document using a CSS or XPath selector.
Query XML	Extracts values from XML documents using XPATH queries, enabling structured data extraction from XML content in Grooper.
Read Barcode	Extracts barcode values from document images using configurable barcode recognition.
Read Metadata	Reads a metadata value from a document by accessing a property on an attachment or content link.
Read Zone	Extracts text content from a specified rectangular region (zone) of a document.
Select Page	Selects and outputs the full content of one or more pages from a document, based on page number and/or content criteria.
Word Match	Extracts individual words and multi-word phrases (N-grams) from document text for use in classification, data extraction, and normalization.
Zonal OMR	Reads one or more checkboxes using manually-configured zones.

This property allows you to specify a Value Extractor that isolates and filters the document content to be included as context in the prompt sent to the LLM. If not set, the entire document text in scope will be included by default.

Use this feature to narrow the input to only the most relevant portions of the document, improving prompt clarity and model performance. This is particularly useful when working with long documents or when only a specific section (e.g., a header, clause, or table) contains the information needed to answer the question or fulfill the instructions.

For example:

If extracting an effective date, you might use a ClauseDetection extractor to isolate the "Term" section.
If summarizing parties, a pattern-based extractor could be configured to include only the signature blocks.

The filtered content returned by this extractor is preprocessed (via the Preprocessing property) before being passed to the LLM.

Note: The extractor used here should return a single Data Instance or a set of instances that logically concatenate into coherent input for the LLM. Excessive or irrelevant content may reduce output quality due to token limitations. If no value is specified, all text content in scope will be included.

Response

Max Response Length

Nullable Int32?

►

An optional maximum length of the response, in tokens.

JSON Schema

String

►

An optional schema defining JSON output. If set, causes the LLM to use structured output mode, returning only JSON.

The 'JSON Schema' property allows you to specify a JSON Schema that defines the exact structure of the LLM's response. When provided, the LLM is instructed to return output that conforms to this schema, enabling highly reliable and consistent extraction of structured data.

How It Works

The schema must be a valid JSON object with "type": "object" and a "title" property.
All expected properties, their types, and any validation rules should be defined in the schema.
When a schema is set, the LLM is prompted to return only JSON matching the schema, and the extractor will parse and validate the response accordingly.

Configuration Guidance

Use the code editor to author or paste a JSON schema that matches your extraction requirements.
Always include a "title" property (e.g., "title": "InvoiceLineItem").
Define each property with its type (e.g., "type": "string", "type": "number").
Optionally, add required, enum, default, or description fields for more precise control.

Example


{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "InvoiceLineItem",
  "type": "object",
  "properties": {
    "description": { "type": "string" },
    "quantity": { "type": "number" },
    "amount": { "type": "number" }
  },
  "required": ["description", "quantity", "amount"]
}

Validation

The schema is checked at runtime. If "type" is not "object" or "title" is missing, a validation error is raised.
Invalid or incomplete schemas may result in extraction errors or unpredictable LLM output.

Usage Scenarios

Extracting structured records such as parties, line items, or key-value pairs.
Enforcing a consistent output format for downstream processing in tables or sections.
Reducing post-processing by ensuring the LLM returns only the required fields.

Providing a schema is highly recommended for complex or multi-field extraction tasks.

Parse Json Response

Boolean

►

If enabled, JSON returned in the response will be parsed into a Data Instance hierarchy.

Selector

String

►

An optional selector to be used when parsing JSON responses.

Match All

Boolean

►

If enabled, all matching text instances will be returned; otherwise, only the best match is returned.

Used By

Document Type Extract From Data Column Data Field Lexical Rules-Based Spell Corrector Auto Complete Settings Paragraph Marker Metadata Options OCR Layer Line Periodicity Detector Fixed Width Labeled Value Select Page Data Type OCR Reader Divider Anchor Simple

Ask AI

Remarks

Overview

How It Works

Key Features

Configuration Guidance

Example Scenarios

LLM Connector Requirement

Diagnostics

Properties

Configuration Guidance

Example

Configuration Guidance

Usage in Grooper

Impact

Configuration Guidance

Examples

Configuration Guidance

Example Scenario

Configuration Guidance

How It Works

Configuration Guidance

Example

Validation

Usage Scenarios

Primary Goal: Data Table and Section Compatibility

Configuration Guidance

Example Prompts

Output Structure

Usage Tips

Purpose and Usage

How to Construct a Selector

Example

Configuration Guidance

Troubleshooting

Purpose and Behavior

Configuration Guidance

Example Scenarios

See Also

Used By