Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

LLM Classifier

Classify Method Grooper.GPT

Classifies a document by asking an AI to select the document type from a list.

Remarks

The LLM Classifier leverages a large language model (LLM) to automate document classification in Grooper. This method formulates a structured prompt containing the document’s text (or a quoted excerpt, if configured), along with optional system-level instructions, and requests the LLM to select the most appropriate document type from a predefined list.

Role in Grooper

The classifier is designed for folder-level classification, assigning a Document Type to each Batch Folder based on its content. It is ideal for scenarios where traditional rules-based or feature-based classification is insufficient, and where semantic understanding of document text is required.

Classification is performed by sending the document’s content and a list of candidate Document Types (with their descriptions) to the LLM. The model analyzes the text and selects the best match, enabling robust classification even for complex or variable documents.

> Note: Page-level classification is not supported. Attempts to classify individual Batch Pages will result in an exception.

Configuration and Usage

To use the classifier:

  • Assign a valid chat-capable LLM to the 'Model' property (e.g., OpenAI GPT models).
  • Optionally configure 'Chat Parameters' to adjust temperature, max tokens, and other model settings.
  • Use 'Document Quoting' to focus classification on key excerpts, such as the first and last N pages, or other relevant sections.
  • Provide custom 'Instructions' to guide the LLM for complex or domain-specific classification tasks.
  • Ensure each Document Type has a meaningful and distinct description, as these are included in the prompt and help the LLM distinguish between types.

The classifier is typically invoked as part of the Classify activity in a Batch Process, or manually via commands in the Grooper UI. It is suitable for both attended and unattended workflows.

How It Works

  1. The classifier gathers the document text (or quoted excerpt).
  2. It builds a prompt including the document content, system instructions, and a list of candidate Document Types with their descriptions.
  3. The prompt is sent to the LLM, which returns the name of the best-matching document type.
  4. The selected type is assigned to the Batch Folder, enabling downstream extraction, validation, and export activities.

Diagnostic Artifacts

When diagnostics are enabled, the classifier logs the following artifacts for review and troubleshooting:

  • Schema.json: The JSON schema sent to the LLM, describing the expected response format and candidate types.
  • Chat Log.jsonl: The full conversation with the LLM, including system and user messages, and the model’s response.

These artifacts are attached to the Batch Folder and can be reviewed in the Grooper UI or exported for audit and analysis.

Best Practices

  • Use clear, concise, and distinctive descriptions for each Document Type.
  • Tailor 'Instructions' to address edge cases, ambiguous categories, or domain-specific requirements.
  • Limit the number of candidate types to those relevant for the classification context.
  • Use 'Document Quoting' to reduce token usage and focus the LLM on the most informative content.
  • Review diagnostic artifacts to understand model decisions and improve configuration.

For more information, see the documentation for Classify Method, Document Type, Batch Folder, and Batch Process.

Properties

NameTypeDescription

See Also

Used By

Notification