Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Data Type

Extractor Node Grooper.Extract

Recognizes and extracts complex data values or structures from document text using one or more extractors and configurable collation logic.

Remarks

The Data Type object is a flexible data extraction tool in Grooper, designed to identify and capture information that cannot be matched by a single Value Extractor. It enables the recognition of both simple and highly complex data patterns, such as dates in multiple formats, address blocks, table rows, or other structured entities.

Overview

Data Types are used to aggregate results from multiple extractors, combining their outputs according to configurable collation rules. This allows for the extraction of data that may appear in various forms or layouts within a document.

Extractor Configuration

You can define extractors for a Data Type in several ways:

  • Local Extractor: Assign a single extractor using the 'Extractor' property for simple cases.
  • Direct Children: Add Value Readers, Field Classes, or other Data Types as child nodes.
  • Referenced Extractors: Use the 'Referenced Extractors' property to include external extractors.

Extractors are executed in a specific order: local, children, then referenced.

Collation and Output

The 'Collation' property determines how results from all extractors are merged. Collation providers can:

  • Return all results individually (default).
  • Combine results into structured outputs, such as key-value pairs, arrays, or table rows.
  • Enforce spatial or logical relationships between extracted values.

Choose the collation provider that matches your data extraction scenario.

Filtering and Post-Processing

Data Types support additional configuration for refining results:

  • Input Filter: Restrict extraction to a subset of the document.
  • Exclusion Extractor: Remove unwanted results that overlap with exclusion matches.
  • Subtraction Extractor: Remove specific content from output values.
  • Lookup: Validate or correct extracted values using a vocabulary list.
  • Result Filter and Result Options: Further filter and process output instances.
  • Post Processing: Apply custom logic to each result after extraction.

Use Cases

The following examples illustrate common scenarios for Data Types, each using a different collation method:

  1. Capturing a date value in multiple formats
    Use the Individual collation provider to merge results from several extractors, each matching a different date format (e.g., 01/01/2000, January 1, 2000, 01-JAN-2000).

    • How it works:
      • Configure multiple Value Extractors, each targeting a specific date format using regular expressions or parsing logic.
      • The Individual collation method returns all matches as separate results, regardless of which extractor found them.
      • This approach ensures that all valid date representations are captured, even if they appear in different formats within the same document.
    • When to use:
      • When documents may contain the same data element in multiple possible formats, and you want to capture every occurrence.
  2. Capturing arrays of repeated values
    Use the Array collation provider to collect multiple occurrences of a repeated value, such as a list of invoice numbers or serial numbers, into a single array output.

    • How it works:
      • Define an extractor that matches the repeated value (e.g., serial number).
      • The Array collation method groups all matches into an array, which can be mapped to an array-type Data Field.
      • This is useful for capturing lists of items, such as all part numbers on a packing slip or all email addresses in a correspondence.
    • When to use:
      • When you need to return a collection of similar values as a single array result, rather than as individual outputs.
  3. Recognizing key-value pairs
    Use the Key-Value Pair collation provider to pair extracted keys (such as field labels) with their corresponding values, enabling structured extraction of form fields or labeled data.

    • How it works:
      • Configure one extractor to find keys (labels) and another to find values.
      • The Key-Value Pair collation method associates each key with its nearest value, producing structured pairs (e.g., "Name: John Smith").
      • This is ideal for extracting data from forms, statements, or any document where information is presented as labeled fields.
    • When to use:
      • When extracting structured data from forms, tables, or documents with consistent label-value formatting.
  4. Recognizing an address block with ordered fields
    Use the Ordered Array collation provider to extract multi-line address blocks, where each line or field (e.g., Name, Street, City, State, Zip) is captured by a separate extractor and combined in a specific order.

    • How it works:
      • Create extractors for each address component (e.g., one for Name, one for Street, etc.).
      • The Ordered Array collation method assembles the results in the defined order, ensuring the output matches the expected address structure.
      • This approach is robust to variations in address formatting, as each field is matched independently but output as a single, ordered block.
    • When to use:
      • When extracting structured, multi-line data where the order of fields is important, such as mailing addresses or contact blocks.
  5. Capturing a complex pattern with multiple parts
    Use the Pattern-Based collation provider to recognize data elements that consist of multiple, possibly optional, parts—such as a policy number with optional prefixes and suffixes, or a product code with embedded metadata.

    • How it works:
      • Define extractors for each part of the pattern (e.g., prefix, core value, suffix).
      • The Pattern-Based collation method coordinates these extractors, matching only when the required pattern (including optional parts) is satisfied.
      • This enables extraction of values that cannot be matched by a single regular expression or extractor, especially when the pattern is variable or context-dependent.
    • When to use:
      • When extracting data elements that have a complex, multi-part structure, or when optional/variable components must be recognized as part of a whole.

Usage Guidance

  • Use Data Types to model data elements that require multiple extraction strategies or complex validation.
  • Configure extractors and collation to match the structure and variability of your target data.
  • Leverage filtering and post-processing options to ensure high-quality, relevant results.
  • Reference Data Types from higher-level objects to integrate them into your extraction workflows.

For more information, see the documentation for Value Extractors, Field Classes, Collation Providers, and related extraction objects.

Properties

NameTypeDescription
General
Options
Output
Info

Design Tabs

General View or edit properties of a node.
Reports View reports for a node.
Scripting Create, debug, modify, and compile scripts for scriptable nodes.
Tester Test an Extractor Node on documents in a test batch.
Advanced View or edit advanced details about a node.

Context Menu Commands

Command Shortcut Description
quick_reference_all Convert To Value Reader Converts this Data Type to a Value Reader with equivalent functionality.

Child Types

See Also

Used By

Notification