Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Collation Provider

Embedded Object Grooper.Extract

Collation Providers define how the results of multiple extractors within a Data Type into a final result set.

Remarks

A Collation Provider defines how the output from each extractor associated with a Data Type is interpreted and merged.

How Collation Providers Work

  • Each Data Type can reference multiple extractors using the following properties:
    • 'Extractor': A single local extractor for simple cases.
    • 'Extractors': A list of referenced extractors.
    • Direct child extractors (such as Value Readers or Field Classes).
  • At runtime, each extractor produces a list of matches. If there are N extractors, N lists of results are generated.
  • The Collation Provider determines how these lists are combined into the final output.

Collation Methods

The specific collation method is selected via the 'Collation' property on the Data Type. Examples include:

  • Individual: Returns all results from all extractors, typically used when each extractor matches a different variation of the target data.
  • Other Providers: May interpret extractor results as more complex entities, such as key-value pairs, arrays, table rows, or data regions.

> Note: The choice of Collation Provider directly impacts how extracted data is structured and returned. For example, using a provider that groups results by geometry can enable extraction of 2D data structures like table rows or address blocks.

Example

Suppose a Data Type is configured to extract dates in multiple formats. Each extractor matches a different format (e.g., 01/01/2000, January 1, 2000, 01-JAN-2000). The Collation Provider merges these results into a single, unified result set.

See also: 'Collation' property on Data Type.

Derived Types

There are 10 implementations of Collation Provider.

AND Collation provider that returns results only when each extractor produces at least one match.
Array Collation provider that matches and returns arrays (lists) of values arranged in a specific geometric or flow order.
Combine Combines instances from child extractors based on the grouping specified in the Group By property.
Individual Combines the results from all extractors into a single result set.
Key-Value List Matches cases where a key and a list of 1 or more values occur on the document in a specific layout.
Key-Value Pair Matches cases where a key-value pair occur on the document in a specific layout.
Multi-Column Output a single instance where the document has been reformatted to reflect the flow of a multi-column document.
Ordered Array Finds sequences of values where one result is present for each extractor, in the order in which they appear.
Pattern-Based Uses a regular expression to select a sequence of child extractor results.
Split Splits the input at each match found by an extractor.

Used By

Notification