Grooper Help - Version 25.0
25.0.0024 2,166

Text Document - Normalize

Text Document Command Grooper.Extract

Normalizes encoding, whitespace, and control characters in a text document.

Remarks

The Normalize command processes a plain text file to standardize its encoding, whitespace, and control characters, improving consistency for downstream extraction and export.

Overview

  • Trims whitespace from lines and/or pages.
  • Removes or inserts page breaks (\f) based on configuration.
  • Consolidates multiple blank lines into single blank lines.
  • Converts text encoding and handles byte order marks (BOM).

Workflow

  1. Reads the text file using the specified encoding and BOM settings.
  2. Optionally splits the text into pages using page breaks.
  3. For each page:
    • Optionally trims leading/trailing whitespace.
    • Optionally trims whitespace from each line.
    • Optionally removes double-spacing (multiple blank lines).
  4. Optionally removes or inserts page breaks.
  5. Saves the normalized text back to the Batch Folder attachment.

Configuration

  • Use TrimLines to trim whitespace from each line.
  • Use TrimPages to trim whitespace from each page.
  • Use RemovePageBreaks to remove all page breaks.
  • Use PageBreakPattern to insert page breaks at lines matching a pattern.
  • Use RemoveDoubleSpacing to consolidate multiple blank lines.
  • Set Encoding and DetectBOM for correct text interpretation.

Usage Notes

  • Recommended as a preprocessing step before extraction or export.
  • Ensures consistent text structure for downstream activities.

Properties

NameTypeDescription
General
Encoding

See Also

Notification