Grooper Help - Version 25.0
25.0.0024 2,166

Chunk Settings

Embedded Object Grooper.GPT

Configures how text is divided into chunks for natural language search and embedding operations.

Remarks

The Chunk Settings object determines how Grooper splits text content into overlapping segments, or "chunks," for use in natural language search, embedding, and related AI-driven features.

Chunking is essential for breaking large documents or passages into manageable pieces, ensuring that search and embedding models can process content efficiently and return relevant results.

How Chunking Works

  • Text is divided into a series of chunks, each containing a configurable number of words.
  • Chunks can overlap, allowing for context to be preserved between adjacent segments.
  • The size and overlap of each chunk are controlled by the 'Offset' and 'Overlap' properties.

Configuration Guidance

  • Offset: Controls how many words are included in the non-overlapping portion of each chunk.
  • Overlap: Specifies how many words from the previous and next chunk are included for context.
  • Chunk Size: The total number of words in each chunk, calculated as Offset + Overlap * 2.

Adjust these settings to balance context preservation with performance. Higher overlap values increase context but may result in more redundant processing. Lower values may improve speed but reduce the amount of context available to the model.

Usage Scenarios

  • Improve search accuracy by ensuring queries match relevant portions of text, even when content spans chunk boundaries.
  • Optimize embedding generation for large documents by processing them in smaller, context-rich segments.

For best results, experiment with different settings to find the optimal balance for your content and use case.

Properties

NameTypeDescription

Used By

Notification