Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Split Text

Code Activity Grooper.Activities

Splits a text document into smaller documents using extractors or line counts.

Remarks

The Split Text activity divides a plain text document into multiple child documents based on configurable rules.
It is typically used to break up large text files into logical documents for further processing in Grooper.

How It Works

  • You can split documents by a fixed number of lines, by identifying start tags using an extractor, or by a combination of both.
  • Optionally, you can skip a number of lines at the beginning of the document, or exclude regions using an exclusion tag.
  • Each resulting child document can be assigned a specific Content Type for downstream classification and extraction.

Configuration Guidance

  • Use the 'Lines To Skip' property to ignore headers or metadata at the start of the file.
  • Use the 'Lines Per Document' property to split by a fixed number of lines.
  • Use the 'Start Tag' property to split at lines matching a specific pattern or value.
  • Use the 'Exclusion Tag' property to ignore regions of the file that should not be included in any child document.
  • Use the 'Start Offset' property to adjust the split position relative to the start tag.
  • Assign a Content Type to ensure each child document is classified correctly.

> Tip: If both 'Lines Per Document' and 'Start Tag' are specified, a split will occur when either condition is met.

Example

The following example demonstrates how Split Text divides a large text file into individual documents using a start tag extractor.

Before Split Text:

 Batch
  └─📁 Document 1 (Attachment: Statements.txt - 9 lines)
 

Statements.txt:

Header Information
Statement #1001
Line A
Line B
Statement #1002
Line C
Line D
Statement #1003
Line E

After Split Text (using Start Tag: ^Statement #):

 Batch
  └─📁 Document 1
    ├─📄 Statement #1001
    │   Line A
    │   Line B
    ├─📄 Statement #1002
    │   Line C
    │   Line D
    └─📄 Statement #1003
        Line E
 

In this example, the original text file is split into three child documents at each line matching the start tag pattern.
Each child document contains the statement header and its associated lines.

For more information, see the documentation for Batch Folder, Content Type, and Value Extractor.

Properties

NameTypeDescription
General
Processing Options

See Also

Used By

Notification