Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

AI Separate

Separation Provider Grooper.GPT

Separates a sequence of pages into documents using AI to determine the location of page boundaries.

Remarks

AI Separate provides an advanced, AI-driven method for dividing a continuous stream of Batch Pages into discrete documents within a Batch Folder. Leveraging a large language model (LLM), this separation provider analyzes the content and context of each page to intelligently detect document boundaries—minimizing manual configuration and improving accuracy for complex or variable document sets.

Unlike traditional separation methods that rely on fixed rules, barcodes, or control sheets, AI Separate uses natural language understanding to evaluate the meaning and structure of page content. This enables robust separation even when documents lack consistent separators or when page layouts vary significantly.

How It Works

For each page in the batch:

  • A "window" of pages is selected, centered on the page under analysis. The number of neighboring pages included is controlled by the 'Window Extent' property.
  • The content (and optionally metadata) of each page in the window is formatted and sent to the configured LLM, along with custom instructions and a system prompt.
  • The LLM is asked to determine if the center page begins a new document. If document types are configured, the model is also prompted to classify the new document.
  • The model responds in JSON format, indicating whether a new document starts and, if applicable, the document type.
  • The results are used to split the batch into documents and assign types as needed.

Configuration and Usage

To use AI Separate:

  1. Ensure an LLM Connector is configured at the root of the repository, with access to a suitable OpenAI-compatible model.
  2. Set the 'Model' property to specify which LLM to use for analysis.
  3. Optionally, provide custom 'Instructions' to guide the model's decision-making for your specific document set.
  4. Adjust the 'Window Extent' to control how much context is provided for each decision.
  5. (Optional) Specify one or more Document Types to enable classification in addition to separation.
  6. Configure additional options such as quoting method, inclusion of page metadata, and output formatting as needed.

This approach is ideal for scenarios where document boundaries are ambiguous, content-driven, or not easily captured by static rules.

Diagnostic Artifacts

AI Separate can record detailed diagnostic output to support troubleshooting and auditing:

  • Schema.json: The JSON schema provided to the LLM for output validation.
  • Chat Log.jsonl: The full chat log of the conversation with the LLM for each page window.
  • Log Entries: Key decisions, such as boundary detection, classification results, and model reasoning, are written to the diagnostic log.

These artifacts can be reviewed to understand the model's decisions, verify configuration, and refine prompts or instructions for improved results.

Notes and Considerations

  • If no Document Types are configured, only boundary detection is performed; classification is disabled.
  • The accuracy and cost of separation depend on the selected model, window size, and prompt quality.
  • Including more context (larger window, metadata) may improve results but increases processing time and token usage.
  • Custom instructions can significantly influence the model's behavior—test and refine as needed for best results.

Properties

NameTypeDescription

See Also

Used By

Notification