Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Individual

Collation Provider Grooper.Extract

Combines the results from all extractors into a single result set.

Remarks

This collation provider acts as an aggregator, and produces a result set containing all results from all extractors.

One of the primary uses for this collation provider is to capture data values which appear in multiple presentation formats. For example, in a Data Type designed to capture dates, each extractor might match a different date/time format. (i.e. '01/01/2000', 'January 1, 2000', '01-JAN-2000', etc.) In a Data Type designed to match the label for an Invoice Date field, each extractor might match a different variation of the field label.

Another common use for this collation provider is to attempt extraction of the same value using multiple extraction methods. As an example, consider a structured form where the Effective Date appears to the right of a text label "Effective Date:", and just below the section heading "Contract Information". In this case, Extractor A would attempt to find a date to the right of "Effective Date:", while Extractor B searches for dates just below "Contract Information". This way, if Extractor A fails for some reason, Extractor B may succeed. In cases where both succeed, the Deduplication property found on Data Type objects can be configured to select 1 result as the winner.

Other known uses include the following:

  • When extracting features for Lexical classification, it is useful for aggregating multiple feature types into a single list of features.
  • When creating classification rules for a Document Type, it is useful as an "OR" operator, where each extractor defines a different rule.

Properties

NameTypeDescription

Used By

Notification