Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

GPT Embed

Code Activity Grooper.GPT

Generates and stores vector embeddings for the text content of a document using an OpenAI-compatible model.

Remarks

The 'GPT Embed' activity enables Grooper to create semantic vector representations of document text, supporting advanced AI-powered features such as semantic search, similarity comparison, and context-aware retrieval. When executed as part of a batch process, this activity splits the document's text into segments (chunks), generates embeddings for each chunk using the selected model, and stores the results for downstream use.

How It Works

  • The document text is divided into chunks according to the 'Chunking' property.
  • Each chunk is sent to the configured OpenAI-compatible model for embedding generation.
  • Embeddings are saved as a file on the Batch Object (folder or page), unless 'Overwrite' is false and embeddings already exist.

Configuration & Usage

  • Requires an LLM Connector to be configured at the repository root.
  • The choice of chunking strategy and model can significantly affect search quality, performance, and cost.
  • Embeddings are reusable by other activities or external systems.

How Embeddings Are Used

The embeddings generated and stored by 'GPT Embed' are consumed by the Semantic quoting method and related features in Grooper. When performing semantic search, clause detection, or context-aware extraction, the Semantic method loads these precomputed embeddings for each document chunk and compares them to the embeddings of user-supplied queries or sample clauses.

This enables highly efficient and accurate similarity scoring, allowing Grooper to identify and extract the most relevant content even when the wording varies. The same mechanism underpins advanced features such as clause detection, policy matching, and AI-driven extraction, ensuring that embeddings generated by this activity are a foundational asset for downstream AI and search workflows.

Best Practices

  • Select a chunking strategy that balances context size and search granularity for your documents.
  • Use 'Overwrite' judiciously to avoid unnecessary API calls and costs.
  • Test with representative documents to optimize chunk size and overlap.

Properties

NameTypeDescription
General
Processing Options

See Also

Used By

Notification