Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Search Classifier

ESP Classify Method Grooper.GPT

Classifies documents by searching for similar items in a vector-based index.

Remarks

The Search Classifier enables automated document classification in Grooper by comparing the content of incoming documents to a pre-indexed set of documents using vector embeddings. This method is designed for users who want fast, deterministic classification based on content similarity, without requiring a language model.

How It Works

  • When a document or page is processed, the Search Classifier generates a vector embedding of its content using the configured embedding provider.
  • It submits a vector query to the selected Azure Cognitive Search index, retrieving the most similar indexed documents or chunks.
  • Results are ranked by cosine similarity, and the most probable Content Type is assigned based on the top matches.
  • For page-level classification, chunk data is used (when available) to identify the most likely originating page in multi-page matches.

Configuration and Usage

  • The Search Classifier is configured on a Content Model via its classification method.
  • The associated Indexing Behavior must have vector search enabled and be connected to a compatible Azure Cognitive Search index.
  • Users can optionally restrict classification results using a filter expression, and control the number of nearest neighbors considered.
  • This method is suitable for scenarios where classification should be based strictly on indexed data, ensuring repeatable results.

Deterministic Behavior

  • Unlike LLM-based classifiers, the Search Classifier does not use generative AI or language models.
  • Classification is based solely on vector similarity to indexed content, making results predictable and easy to audit.

Diagnostics and Logging

  • During classification, diagnostic artifacts may be generated for review and troubleshooting:
    • "Search Results.json": Contains the raw results returned from the search index for each classification attempt.
    • Log entries: Summarize match scores and candidate Content Types for each document or page.

Best Practices

  • Ensure the Indexing Behavior is properly configured and the index is populated with representative documents.
  • Tune the filter and nearest neighbor count to balance accuracy and performance.
  • Use diagnostic outputs to validate classification results and refine index content as needed.

For more information, see the documentation for Content Model, Indexing Behavior, and Content Type.

Properties

NameTypeDescription

See Also

Used By

Notification