Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Text Document

Attachment Type Grooper.Extract

Represents a plain text file attachment type for Batch Folders, enabling text-centric processing, parsing, and integration in Grooper.

Remarks

The TextDocument class provides support for plain text files (.txt, text/plain) within Batch Folders. It enables Grooper to recognize, process, and manipulate text files as attachments, supporting downstream data extraction, normalization, and export workflows.

Overview

  • TextDocument is automatically selected for files with the .txt extension or text/plain MIME type.
  • Supports configurable character encoding and byte order mark (BOM) detection for robust text file handling.
  • Integrates with Batch Folders to enable document-centric workflows, including classification, extraction, and export.

Text File Handling

  • Handles text file reading and writing with support for various encodings (UTF-8, Unicode, etc.).
  • Provides options for line skipping, line splitting, and region exclusion during processing.
  • Enables assignment of Content Type to child documents created from splits.

Available Commands

  • Split: Splits a text document into smaller documents using a fixed line count or a Value Extractor to identify split positions.
    • Configure LinesPerDocument for fixed-size splits, or use StartTag to split based on content.
    • Supports skipping header lines, region exclusion, and assignment of Content Type to child documents.
  • InsertPageBreaks: Inserts page breaks (\f) into a text document based on a regular expression pattern.
    • Use PageBreakPattern to define where breaks should be inserted.
    • Supports inserting breaks before or after matched lines.
    • Handles encoding and BOM detection.
  • Normalize: Normalizes encoding, whitespace, and control characters in a text document.
    • Options to trim whitespace from lines and pages, remove or insert page breaks, and consolidate blank lines.
    • Supports encoding conversion and BOM detection.

Usage Notes

  • Use TextDocument to enable text-aware processing in Grooper, including splitting, normalization, and extraction.
  • Combine with Content Type and Data Model configuration to extract structured data from text files.
  • Suitable for processing reports, logs, and other plain text content in batch workflows.

For more information, see the documentation for Batch Folder, Content Type, and text extraction in Grooper.

Context Menu Commands

Command Shortcut Description
splitscreen Insert Page Breaks Inserts page breaks into a text document based on a regular expression pattern.
draft Normalize Normalizes encoding, whitespace, and control characters in a text document.
insert_page_break Split Splits a text document into smaller documents, using an extractor to identify split positions within the text content.
Notification