Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Pattern-Based

Collation Provider Grooper.Extract

Uses a regular expression to select a sequence of child extractor results.

Remarks

The PatternBasedProvider collation provider uses a regular expression to combine and structure results from multiple child extractors.

How it works:

  • Child extractor results are referenced in the pattern using @ExtractorName syntax.
  • The regular expression is applied to a virtual document where each extractor result is replaced by its variable reference.
  • Named groups in the pattern can be mapped to output or context elements.

Configuration:

  • The Pattern property defines the regular expression used for collation.
  • The OutputElement and ContextElement properties specify which named groups to use for output and context.
  • The CaseSensitive property controls case sensitivity of the regular expression.
  • The PreprocessingOptions property allows for text preprocessing before pattern matching.

Use cases:

  • Extracting structured data from semi-structured or tabular text, such as transcript lines, itemized lists, or complex forms.
  • Mapping multiple extractor results into a single structured output using flexible pattern logic.

Example: For example, consider the tabular data below, which represents information from a college transcript:

 GE140 WORLD CIVILIZATION I 3.00 A 12.00
 PSY212 GENERAL PSYCHOLOGY 3.00 A 12.00
 GE185 HEALTH CONCEPTS 2.00 C 4.00
 

Three extractors are created for the Data Type:

  • Course No – Matches 'GE140', 'PSY212', etc.
  • Decimal – Matches '3.00', '12.00', etc.
  • Letter Grade – Matches 'A', 'C', etc.

The following collation expression could then be used to select the entire line:

 @Course_No .*? @Decimal @Letter_Grade @Decimal
 

The expression can be further expanded to include group names, mapping values directly to table column names:

 (?<Course_No>@Course_No)
 (?<Description>[^\r]*?)
 (?<Hours>@Decimal)
 (?<Grade>@Letter_Grade)
 (?<Points>@Decimal)
 

Properties

NameTypeDescription

See Also

Used By

Notification