Grooper Help - Version 25.0
25.0.0024 2,166

Data Generator

Embedded Object Grooper.GPT

Defines the generative AI settings used for LLM-powered features in Grooper.

Remarks

The Data Generator object configures how Grooper uses large language models (LLMs) to generate structured data from unstructured document content. It is embedded as a property within various Grooper features that leverage generative AI, such as advanced data extraction, document classification, and content transformation.

Configuration Guidance

  • Choose a model that matches your accuracy, speed, and cost requirements.
  • Use structured output mode whenever possible for the most reliable results.
  • Adjust chat parameters to fine-tune the LLM's behavior for your specific use case.
  • Lowering the 'Temperature' can improve consistency and reduce variability in responses, especially for structured data extraction.
  • Increase 'Max Tries' if you encounter intermittent failures or unparseable responses.

How It Works

When Grooper performs an extraction using a Data Generator:

  1. The extraction schema (as a JSON object) and system/user messages are sent to the LLM.
  2. The LLM is prompted to return data in the required JSON format.
  3. If 'Use Structured Output' is enabled and supported by the model, the LLM is given explicit instructions to return strictly valid JSON matching the schema.
  4. Grooper parses the LLM's response and maps it to the extraction schema.
  5. If the response is invalid, incomplete, or a transient error occurs, Grooper retries the request up to 'Max Tries'.

This approach enables robust, schema-driven extraction from a wide variety of document types, even when traditional rule-based methods are insufficient.

Diagnostics and Logging

When extraction is performed, Data Generator logs diagnostic artifacts to aid in troubleshooting and auditing:

  • Chat Log.jsonl: A line-delimited JSON log of all messages exchanged with the LLM during extraction.
  • Response Data.json: The raw JSON data returned by the LLM for each extraction attempt.
  • LLM Completion Operation: Timing and performance metrics for each LLM request, recorded in the diagnostic log.

These artifacts are available in the extraction diagnostics and can be used to review LLM interactions, analyze failures, and optimize configuration.

Properties

NameTypeDescription

See Also

Used By

Notification