Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Semantic

Quoting Method Grooper.GPT

Selects portions of the document which are semantically similar to a set of examples.

Remarks

This quoting method enables precise and context-aware content selection by performing a natural language similarity search across a document. It leverages an LLM-compatible embeddings model to compare defined sample queries to segments (chunks) of the document, identifying the most semantically similar content.

This is accomplished by:

  • Preprocessing the document using a Text Preprocessor to normalize or clean the text.
  • Splitting the text into overlapping segments using the specified Chunk Settings.
  • Embedding both the query and document chunks using the selected Model.
  • Scoring chunks based on cosine similarity to the query embeddings.
  • Clustering top-scoring chunks with configurable parameters via Cluster Parameters.
  • Optionally expanding results using paragraph and word-level padding to ensure full context.

The final output is a set of extracted quotes representing the most relevant sections of the document, which can be used as input for AI-based extraction tasks (e.g., with AI Extract).

This method significantly improves the efficiency and focus of LLM operations by minimizing prompt size and maximizing contextual relevance. It also enhances document highlighting accuracy by restricting focus to high-similarity regions.

Additionally, this mechanism underpins the Clause Detection extract method, which applies this same technique to locate and extract clauses within structured documents like contracts.

Pre-Computing Embeddings with GPT Embed

In production environments or when working with large document sets, it is often beneficial to pre-compute and store document embeddings using the GPT Embed activity. When embeddings are precomputed in this way, the Semantic quoting method will automatically load the stored embeddings for each document chunk, rather than recomputing them at runtime. This results in faster, more scalable, and cost-effective semantic search and extraction workflows, and ensures consistency across repeated operations. If precomputed embeddings are not available, Semantic will generate them on demand and may optionally save them for future use.

Properties

NameTypeDescription
General
Padding

See Also

Used By

Notification