Data Type

Inherits From Extractor Node Namespace Grooper.Extract

Recognizes and extracts complex data values or structures from document text using one or more extractors and configurable collation logic.

Remarks

The Data Type object is a flexible data extraction tool in Grooper, designed to identify and capture information that cannot be matched by a single Value Extractor. It enables the recognition of both simple and highly complex data patterns, such as dates in multiple formats, address blocks, table rows, or other structured entities.

Overview

Data Types are used to aggregate results from multiple extractors, combining their outputs according to configurable collation rules. This allows for the extraction of data that may appear in various forms or layouts within a document.

Use Data Types when you need to extract information that is too complex for a single Value Extractor, such as multi-format values or multi-field structures.
Data Types can be added as children of a Project or the "Local Resources" folder of a Content Type.
They are referenced by other objects (such as Field Classes or Value Readers) to participate in extraction workflows.

Extractor Configuration

You can define extractors for a Data Type in several ways:

Local Extractor: Assign a single extractor using the 'Extractor' property for simple cases.
Direct Children: Add Value Readers, Field Classes, or other Data Types as child nodes.
Referenced Extractors: Use the 'Referenced Extractors' property to include external extractors.

Extractors are executed in a specific order: local, children, then referenced.

Collation and Output

The 'Collation' property determines how results from all extractors are merged. Collation providers can:

Return all results individually (default).
Combine results into structured outputs, such as key-value pairs, arrays, or table rows.
Enforce spatial or logical relationships between extracted values.

Choose the collation provider that matches your data extraction scenario.

Filtering and Post-Processing

Data Types support additional configuration for refining results:

Input Filter: Restrict extraction to a subset of the document.
Exclusion Extractor: Remove unwanted results that overlap with exclusion matches.
Subtraction Extractor: Remove specific content from output values.
Lookup: Validate or correct extracted values using a vocabulary list.
Result Filter and Result Options: Further filter and process output instances.
Post Processing: Apply custom logic to each result after extraction.

Use Cases

The following examples illustrate common scenarios for Data Types, each using a different collation method:

Capturing a date value in multiple formats
Use the Individual collation provider to merge results from several extractors, each matching a different date format (e.g., 01/01/2000, January 1, 2000, 01-JAN-2000).
- How it works:
  - Configure multiple Value Extractors, each targeting a specific date format using regular expressions or parsing logic.
  - The Individual collation method returns all matches as separate results, regardless of which extractor found them.
  - This approach ensures that all valid date representations are captured, even if they appear in different formats within the same document.
- When to use:
  - When documents may contain the same data element in multiple possible formats, and you want to capture every occurrence.
Capturing arrays of repeated values
Use the Array collation provider to collect multiple occurrences of a repeated value, such as a list of invoice numbers or serial numbers, into a single array output.
- How it works:
  - Define an extractor that matches the repeated value (e.g., serial number).
  - The Array collation method groups all matches into an array, which can be mapped to an array-type Data Field.
  - This is useful for capturing lists of items, such as all part numbers on a packing slip or all email addresses in a correspondence.
- When to use:
  - When you need to return a collection of similar values as a single array result, rather than as individual outputs.
Recognizing key-value pairs
Use the Key-Value Pair collation provider to pair extracted keys (such as field labels) with their corresponding values, enabling structured extraction of form fields or labeled data.
- How it works:
  - Configure one extractor to find keys (labels) and another to find values.
  - The Key-Value Pair collation method associates each key with its nearest value, producing structured pairs (e.g., "Name: John Smith").
  - This is ideal for extracting data from forms, statements, or any document where information is presented as labeled fields.
- When to use:
  - When extracting structured data from forms, tables, or documents with consistent label-value formatting.
Recognizing an address block with ordered fields
Use the Ordered Array collation provider to extract multi-line address blocks, where each line or field (e.g., Name, Street, City, State, Zip) is captured by a separate extractor and combined in a specific order.
- How it works:
  - Create extractors for each address component (e.g., one for Name, one for Street, etc.).
  - The Ordered Array collation method assembles the results in the defined order, ensuring the output matches the expected address structure.
  - This approach is robust to variations in address formatting, as each field is matched independently but output as a single, ordered block.
- When to use:
  - When extracting structured, multi-line data where the order of fields is important, such as mailing addresses or contact blocks.
Capturing a complex pattern with multiple parts
Use the Pattern-Based collation provider to recognize data elements that consist of multiple, possibly optional, parts—such as a policy number with optional prefixes and suffixes, or a product code with embedded metadata.
- How it works:
  - Define extractors for each part of the pattern (e.g., prefix, core value, suffix).
  - The Pattern-Based collation method coordinates these extractors, matching only when the required pattern (including optional parts) is satisfied.
  - This enables extraction of values that cannot be matched by a single regular expression or extractor, especially when the pattern is variable or context-dependent.
- When to use:
  - When extracting data elements that have a complex, multi-part structure, or when optional/variable components must be recognized as part of a whole.

Usage Guidance

Use Data Types to model data elements that require multiple extraction strategies or complex validation.
Configure extractors and collation to match the structure and variability of your target data.
Leverage filtering and post-processing options to ensure high-quality, relevant results.
Reference Data Types from higher-level objects to integrate them into your extraction workflows.

For more information, see the documentation for Value Extractors, Field Classes, Collation Providers, and related extraction objects.

Properties

Name Type Description

General

Local Extractor

Value Extractor

►

Specifies a single local extractor to be used as the primary extraction method for this Data Type.

Can be one of the following types:

Value	Description
Reference	Delegates extraction to another configured extractor, enabling reuse and centralization of extraction logic.
AI Column Extractor	Extracts structured content from documents with two-column layouts.
AI Schema Extractor	Extracts structured data from documents using a large language model (LLM) guided by a user-defined JSON schema.
Ask AI	Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.
Detect Signature	Detects a signature within a specified region of a document page by measuring the percentage of the area that is filled.
Entity Recognition	Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Field Match	Matches the value stored in a previously-extracted field or table column.
Find Barcode	Searches for barcode values in document Layout Data previously detected during image processing.
Highlight Zone	Defines a region of a document to be visually highlighted, without extracting any data values.
Key Phrase Extraction	Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Label Match	Matches a list of one or more label values, using matching options defined by a Labeling Behavior.
Labeled OMR	Reads a group of one or more checkboxes located nearby text labels.
Labeled Value	Extracts a field presented as a label-value pair within a document, associating labels and values based on their spatial relationship.
List Match	Extracts values from document text that match any entry in a list of search terms.
Ordered OMR	Reads one or more checkboxes with a consistent order of appearance inside a rectangular region.
Pattern Match	Extracts values from document text that match a specified regular expression pattern.
Pii Entity Recognition	Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.
Query HTML	Extracts values from an HTML document using a CSS or XPath selector.
Query XML	Extracts values from XML documents using XPATH queries, enabling structured data extraction from XML content in Grooper.
Read Barcode	Extracts barcode values from document images using configurable barcode recognition.
Read Metadata	Reads a metadata value from a document by accessing a property on an attachment or content link.
Read Zone	Extracts text content from a specified rectangular region (zone) of a document.
Select Page	Selects and outputs the full content of one or more pages from a document, based on page number and/or content criteria.
Word Match	Extracts individual words and multi-word phrases (N-grams) from document text for use in classification, data extraction, and normalization.
Zonal OMR	Reads one or more checkboxes using manually-configured zones.

Overview

The 'Extractor' property allows you to assign a single Value Extractor to this Data Type. This is ideal for simple extraction scenarios where only one extraction method is needed.

Execution Order and Combined Extractors

If 'Extractor' is set, and you also configure 'Extractors' (referenced extractors) or add child extractors. All configured extractors will be executed in the following order:

The local extractor (this property)
Any direct child extractors
Any referenced extractors from the 'Extractors' property

The results from all extractors are merged according to the selected 'Collation' method.

This allows you to combine multiple extraction strategies within a single Data Type, and to control the order in which their results are considered and merged.

Configuration

Select the desired Value Extractor type from the drop-down menu.
Configure its properties (such as pattern, region, or barcode type) as needed for your extraction scenario.
If you later need to use multiple extractors, you may set this property, add child extractors, and/or reference additional extractors as needed.

Advanced Usage

The local extractor is best for straightforward cases, such as extracting a single value with a regular expression, barcode reader, or zonal region.
If you need to combine multiple extraction strategies (for example, to capture values in different formats or locations), use the 'Extractors' property or add child extractors.
The local extractor can be used in conjunction with input, exclusion, or subtraction extractors for additional filtering and refinement.

Best Practices

Use the local extractor for the most common or reliable extraction method.
For maintainability, prefer referenced or child extractors when logic becomes complex or needs to be reused.

Referenced Extractors

Extractor Node[]

►

Specifies a list of external extractors to be included in this Data Type's extraction process.

Collation

Collation Provider

►

Defines how results from all extractors are merged and transformed into the final output for this Data Type.

Can be one of the following types:

Value	Description
Split	Splits the input at each match found by an extractor.
AND	Collation provider that returns results only when each extractor produces at least one match.
Array	Collation provider that matches and returns arrays (lists) of values arranged in a specific geometric or flow order.
Combine	Combines instances from child extractors based on the grouping specified in the Group By property.
Key-Value List	Matches cases where a key and a list of 1 or more values occur on the document in a specific layout.
Key-Value Pair	Matches cases where a key-value pair occur on the document in a specific layout.
Multi-Column	Output a single instance where the document has been reformatted to reflect the flow of a multi-column document.
Individual	Combines the results from all extractors into a single result set.
Ordered Array	Finds sequences of values where one result is present for each extractor, in the order in which they appear.
Pattern-Based	Uses a regular expression to select a sequence of child extractor results.

Overview

The 'Collation' property determines the method used to combine results from all configured extractors. The selected Collation Provider controls how individual matches are grouped, ordered, or structured in the output.

Collation Methods

Individual: Returns all results as separate values, regardless of which extractor found them.
Array: Groups repeated values into arrays, suitable for lists or collections.
Key-Value: Pairs extracted keys (labels) with their corresponding values, ideal for forms or labeled data.
OrderedArray: Combines results from multiple extractors in a specific order, such as address blocks.
Pattern Based: Matches complex, multi-part patterns by coordinating multiple extractors.
Other providers may support geometric, tabular, or region-based grouping.

Execution and Output

The collation provider is applied after all extractors have run and produced their individual result sets.
The provider merges, groups, or structures these results according to its logic, producing the final output for the Data Type.
The choice of collation directly affects the structure of the extracted data (e.g., single values, arrays, key-value pairs, or complex objects).

Configuration

Select the collation provider that matches your data extraction scenario from the drop-down menu.
Configure any additional properties specific to the chosen provider (such as grouping rules, ordering, or pattern definitions).
Collation is essential for scenarios where multiple extractors are used, or where the output must be structured in a particular way.

Advanced Usage

Use collation to enforce relationships between extracted values, such as spatial proximity, order, or pattern conformance.
For table extraction, use a collation provider that supports row and column grouping.
For address or block extraction, use ordered or geometric collation to assemble multi-line or multi-field results.

Best Practices

Choose the simplest collation method that meets your needs.
Test extraction results with different collation providers to ensure correct grouping and output structure.
Document the rationale for your collation choice, especially in complex scenarios.

Description

String

►

Specifies a description for the item.

Options

Culture Filter

Culture Data[]

►

Specifies an optional list of cultures that this Data Type supports for extraction.

Input Filter

Value Extractor

►

Specifies an optional extractor that restricts the scope of data extraction to a subset of the input.

Can be one of the following types:

Value	Description
Reference	Delegates extraction to another configured extractor, enabling reuse and centralization of extraction logic.
AI Column Extractor	Extracts structured content from documents with two-column layouts.
AI Schema Extractor	Extracts structured data from documents using a large language model (LLM) guided by a user-defined JSON schema.
Ask AI	Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.
Detect Signature	Detects a signature within a specified region of a document page by measuring the percentage of the area that is filled.
Entity Recognition	Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Field Match	Matches the value stored in a previously-extracted field or table column.
Find Barcode	Searches for barcode values in document Layout Data previously detected during image processing.
Highlight Zone	Defines a region of a document to be visually highlighted, without extracting any data values.
Key Phrase Extraction	Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Label Match	Matches a list of one or more label values, using matching options defined by a Labeling Behavior.
Labeled OMR	Reads a group of one or more checkboxes located nearby text labels.
Labeled Value	Extracts a field presented as a label-value pair within a document, associating labels and values based on their spatial relationship.
List Match	Extracts values from document text that match any entry in a list of search terms.
Ordered OMR	Reads one or more checkboxes with a consistent order of appearance inside a rectangular region.
Pattern Match	Extracts values from document text that match a specified regular expression pattern.
Pii Entity Recognition	Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.
Query HTML	Extracts values from an HTML document using a CSS or XPath selector.
Query XML	Extracts values from XML documents using XPATH queries, enabling structured data extraction from XML content in Grooper.
Read Barcode	Extracts barcode values from document images using configurable barcode recognition.
Read Metadata	Reads a metadata value from a document by accessing a property on an attachment or content link.
Read Zone	Extracts text content from a specified rectangular region (zone) of a document.
Select Page	Selects and outputs the full content of one or more pages from a document, based on page number and/or content criteria.
Word Match	Extracts individual words and multi-word phrases (N-grams) from document text for use in classification, data extraction, and normalization.
Zonal OMR	Reads one or more checkboxes using manually-configured zones.

Overview

The 'Input Filter' property allows you to define a Value Extractor that limits where extraction occurs within the input document. Instead of running extraction logic against the entire document, extraction is performed only within the regions or segments matched by the input filter.

This is especially useful for:

Focusing extraction on specific pages, sections, or regions.
Improving performance by reducing the amount of content processed.
Simplifying extraction logic by isolating relevant content.

How It Works

When an input filter is specified, it is executed first against the input document.
For each match produced by the input filter, the main extraction logic (local, child, and referenced extractors) is executed only within that matched region.
Output instance indexes are automatically adjusted to reflect their position in the original document.

Configuration

Select a Value Extractor (such as a pattern match or region extractor) to use as the input filter.
Configure its properties to match the desired region, page, or section.
If no input filter is specified, extraction runs against the entire input.

Advanced Usage

Use input filters to restrict extraction to:
- The first or last page of a document.
- The top N lines of each page.
- Specific labeled sections (e.g., "PERSONAL INFO").
- Custom regions defined by regular expressions or zonal extractors.
Input filters can be combined with other extractors for layered filtering and extraction.

Sample Regular Expressions

Here are some examples of regular expressions the might be used with Pattern Match to create input filters for common scenarios:

Restrict to the first page of the document: ^[^\f]+
Restrict to the last page of the document: [^\f]+$
Restrict to the first 5 lines of the document: ^([^\r\n]+\r\n){5}
Restrict to the top 3 lines of each page: (^|\f)([^\r\n]+\r\n){3}
Restrict to the "PERSONAL INFO" section: \r\nPERSONAL INFO[^\f]+\r\nEDUCATION

Best Practices

Use input filters to reduce noise and improve extraction accuracy.
Test input filters thoroughly to ensure they match the intended regions.
Document the purpose and logic of each input filter for maintainability.

Exclusion Extractor

Value Extractor

►

Specifies an optional extractor to filter out undesirable results from the output set.

Can be one of the following types:

Value	Description
Reference	Delegates extraction to another configured extractor, enabling reuse and centralization of extraction logic.
AI Column Extractor	Extracts structured content from documents with two-column layouts.
AI Schema Extractor	Extracts structured data from documents using a large language model (LLM) guided by a user-defined JSON schema.
Ask AI	Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.
Detect Signature	Detects a signature within a specified region of a document page by measuring the percentage of the area that is filled.
Entity Recognition	Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Field Match	Matches the value stored in a previously-extracted field or table column.
Find Barcode	Searches for barcode values in document Layout Data previously detected during image processing.
Highlight Zone	Defines a region of a document to be visually highlighted, without extracting any data values.
Key Phrase Extraction	Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Label Match	Matches a list of one or more label values, using matching options defined by a Labeling Behavior.
Labeled OMR	Reads a group of one or more checkboxes located nearby text labels.
Labeled Value	Extracts a field presented as a label-value pair within a document, associating labels and values based on their spatial relationship.
List Match	Extracts values from document text that match any entry in a list of search terms.
Ordered OMR	Reads one or more checkboxes with a consistent order of appearance inside a rectangular region.
Pattern Match	Extracts values from document text that match a specified regular expression pattern.
Pii Entity Recognition	Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.
Query HTML	Extracts values from an HTML document using a CSS or XPath selector.
Query XML	Extracts values from XML documents using XPATH queries, enabling structured data extraction from XML content in Grooper.
Read Barcode	Extracts barcode values from document images using configurable barcode recognition.
Read Metadata	Reads a metadata value from a document by accessing a property on an attachment or content link.
Read Zone	Extracts text content from a specified rectangular region (zone) of a document.
Select Page	Selects and outputs the full content of one or more pages from a document, based on page number and/or content criteria.
Word Match	Extracts individual words and multi-word phrases (N-grams) from document text for use in classification, data extraction, and normalization.
Zonal OMR	Reads one or more checkboxes using manually-configured zones.

Overview

The 'Exclusion Extractor' property allows you to define a Value Extractor that identifies regions or values to be excluded from the final extraction results. Any result that overlaps with a match from the exclusion extractor will be discarded.

This is especially useful for:

Removing false positives that occur in known irrelevant regions (such as headers, footers, or watermarks).
Excluding values that match certain patterns or appear in specific zones.

How It Works

The exclusion extractor is executed after the main extraction logic.
Each output instance is checked for overlap with any exclusion match.
If an overlap is found, the result is removed from the output.

Configuration

Select a Value Extractor (such as a pattern match, region, or list match) to use as the exclusion extractor.
Configure its properties to match the regions or values you wish to exclude.
If no exclusion extractor is specified, no exclusion filtering is performed.

Advanced Usage

Use exclusion extractors to filter out repeated headers, footers, or other non-data content.
Combine with input filters and subtraction extractors for layered filtering.
Exclusion logic is especially helpful in documents with recurring noise or boilerplate text.

Best Practices

Test exclusion extractors thoroughly to avoid removing valid results.
Document the exclusion logic for future maintainers.

Subtraction Extractor

Value Extractor

►

Specifies an optional extractor to remove specific content from output values.

Can be one of the following types:

Value	Description
Reference	Delegates extraction to another configured extractor, enabling reuse and centralization of extraction logic.
AI Column Extractor	Extracts structured content from documents with two-column layouts.
AI Schema Extractor	Extracts structured data from documents using a large language model (LLM) guided by a user-defined JSON schema.
Ask AI	Executes a completion using a large language model (LLM) and returns one hit for each choice in the response.
Detect Signature	Detects a signature within a specified region of a document page by measuring the percentage of the area that is filled.
Entity Recognition	Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Field Match	Matches the value stored in a previously-extracted field or table column.
Find Barcode	Searches for barcode values in document Layout Data previously detected during image processing.
Highlight Zone	Defines a region of a document to be visually highlighted, without extracting any data values.
Key Phrase Extraction	Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Label Match	Matches a list of one or more label values, using matching options defined by a Labeling Behavior.
Labeled OMR	Reads a group of one or more checkboxes located nearby text labels.
Labeled Value	Extracts a field presented as a label-value pair within a document, associating labels and values based on their spatial relationship.
List Match	Extracts values from document text that match any entry in a list of search terms.
Ordered OMR	Reads one or more checkboxes with a consistent order of appearance inside a rectangular region.
Pattern Match	Extracts values from document text that match a specified regular expression pattern.
Pii Entity Recognition	Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.
Query HTML	Extracts values from an HTML document using a CSS or XPath selector.
Query XML	Extracts values from XML documents using XPATH queries, enabling structured data extraction from XML content in Grooper.
Read Barcode	Extracts barcode values from document images using configurable barcode recognition.
Read Metadata	Reads a metadata value from a document by accessing a property on an attachment or content link.
Read Zone	Extracts text content from a specified rectangular region (zone) of a document.
Select Page	Selects and outputs the full content of one or more pages from a document, based on page number and/or content criteria.
Word Match	Extracts individual words and multi-word phrases (N-grams) from document text for use in classification, data extraction, and normalization.
Zonal OMR	Reads one or more checkboxes using manually-configured zones.

Overview

The 'Subtraction Extractor' property allows you to define a Value Extractor that identifies content to be removed from each output value after extraction. This is useful for cleaning up results by stripping unwanted substrings, such as labels, prefixes, suffixes, or formatting artifacts.

How It Works

After extraction, the subtraction extractor is executed against each output value.
Any content matching the subtraction extractor is removed from the value.
If the resulting value is empty or contains only whitespace, the entire result is discarded.

Configuration

Select a Value Extractor (typically a pattern match or region extractor) to use as the subtraction extractor.
Configure its properties to match the content you wish to remove (e.g., trailing punctuation, labels, or formatting).
The subtraction extractor must match a contiguous sequence of characters within the text flow.

Constraints

The subtraction extractor cannot use Collation Provider methods that combine instances geometrically.
It is intended for simple substring removal, not for complex region-based or multi-part extraction.

Advanced Usage

Use subtraction extractors to remove labels (e.g., "Name: John Smith" → "John Smith").
Clean up OCR artifacts or unwanted formatting from extracted values.
Combine with exclusion extractors for comprehensive result filtering.

Best Practices

Ensure the subtraction extractor is specific enough to avoid removing valid data.
Document the purpose and logic of each subtraction extractor for maintainability.

Lookup

Value Lookup

►

Defines a lookup operation to filter, validate, or correct captured values using a vocabulary list.

Output

Deduplicate By

DedupMode

►

Specifies the mode used to deduplicate overlapping Data Instance results.

Can be one of the following values:

Name	Value	Description
Disabled	0	No deduplication of overlapping results is performed. All Data Instances are included in the output, even if they overlap or are redundant. Use this mode when you want to retain every result, regardless of overlap.	►
Area	1	The result occupying the largest geometric region wins. In case of a tie, the result with the highest confidence is chosen. This mode compares the 'Location' area of each Data Instance. Larger area wins. If areas are equal, the instance with higher 'Confidence' is selected. Useful when longer or larger matches are more desirable, such as when one result fully contains another.	►
Length	2	The result matching the longest span wins. In case of a tie, the result with the highest confidence is chosen. This mode compares the 'Length' property of each Data Instance. Longer length wins. If lengths are equal, the instance with higher 'Confidence' is selected. Use this when the number of characters matched is the primary criterion for deduplication.	►
Confidence	3	The result with the highest confidence wins. In case of a tie, the result with the greatest length is chosen. This mode compares the 'Confidence' property of each Data Instance. Higher confidence wins. If confidence is equal, the instance with greater 'Length' is selected. Use this when extractor reliability or preference is expressed through confidence values.	►
Count	4	The result matching the most characters wins. In case of a tie, the result with the highest confidence is chosen. This mode compares the number of characters matched, using OCR data if available, or the value's length otherwise. More characters matched wins. If counts are equal, the instance with higher 'Confidence' is selected. Use this when the total number of matched characters (not just span length) is the most important factor.	►

Deduplication ensures that only a single Data Instance is retained when multiple results overlap in the document content. This is especially useful when multiple extractors or extraction techniques may produce redundant or overlapping results.

Common Deduplication Scenarios

Redundant Extraction:
Multiple extractors may target the same value for redundancy. Deduplication ensures only one result is included in the output.
Preference by Confidence:
When some extractors are preferred, configure them to output higher confidence. Deduplication by confidence will favor these results.
Self-Containing Values:
When one result is a substring of another (e.g., "OWNERSHIP REPORT" vs. "MINERAL OWNERSHIP REPORT"), deduplication by length or area will retain the more specific match.

Usage

Set the deduplication mode using the 'Deduplication Mode' property. When enabled (not 'Disabled'), the 'Compare By' property is also exposed, allowing further control over how duplicates are detected.

Compare Mode

CompareMode

►

Specifies the method used to determine whether two Data Instances are considered duplicates during deduplication.

Can be one of the following values:

Name	Value	Description
Ordinal	0	Items are considered duplicates if they have one or more printable characters in common. This mode compares the actual character content of Data Instances. If any printable character is shared between two instances, they are considered duplicates. Geometric position and character range are not considered.	►
Geometric	1	Items are considered duplicates if they occur on the same page, and their bounding rectangles intersect. This mode compares the 'Location' property of Data Instances on the same page. If the rectangles intersect, the instances are considered duplicates. Character content and range are not considered.	►
Ranged	2	Items are considered duplicates if their character ranges overlap. This mode compares the 'Index' and 'Length' properties of Data Instances. If the character index ranges overlap, the instances are considered duplicates. Geometric position is not considered unless range data is unavailable.	►

The comparison mode controls how overlap or duplication is detected between Data Instances when deduplication is enabled.

The selected mode affects which results are considered overlapping and subject to deduplication, and is configured using the 'Compare By' property when deduplication is enabled.

Result Filter

►

Defines rules for filtering the result set produced by extraction operations.

Result Options

Result Set Options

►

Configures post-processing options for a set of extracted results, enabling value normalization, confidence adjustment, sorting, filtering, and other result set controls.

Post Processing

Result Processor

►

Specifies an optional post-processing operation to be applied to each output instance.

Can be one of the following types:

Value	Description
OCR Reader	Extracts text from a region near each Data Type output instance using OCR or existing OCR results.
OMR Reader	Detects and reads optical marks (checkboxes) associated with each extractor result using OMR.
Place Zone	Places a zone (region) relative to each output instance for downstream extraction or processing.

Overview

The 'PostProcessing' property allows you to define a Result Processor that applies additional logic to each output instance after extraction, filtering, and lookup operations have completed. Post-processing can be used to transform, normalize, or further validate extracted values before they are returned as final results.

How It Works

After all extractors, filters, and lookups have been applied, the post-processing logic is executed for each output instance.
The Result Processor can modify the value, confidence, geometry, or other properties of each result.
Post-processing can also be used to enforce formatting rules, perform calculations, or apply custom business logic.

Info

Group Names

String

►

Displays the list of named groups that this Data Type will output based on the current extractor and collation configuration.

Extractor Count

String

►

Displays the total number of extractors (local, referenced, and child) represented by this Data Type.

Design Tabs

General	View or edit properties of a node.
Reports	View reports for a node.
Scripting	Create, debug, modify, and compile scripts for scriptable nodes.
Tester	Test an Extractor Node on documents in a test batch.
Advanced	View or edit advanced details about a node.

Context Menu Commands

Command	Shortcut	Description
quick_reference_all Convert To Value Reader		Converts this Data Type to a Value Reader with equivalent functionality.

Child Types

Data Type Field Class Value Reader

Used By

Reference Render

Data Type

Remarks

Overview

Extractor Configuration

Collation and Output

Filtering and Post-Processing

Use Cases

Usage Guidance

Properties

Overview

Execution Order and Combined Extractors

Configuration

Advanced Usage

Best Practices

Overview

Execution Order

Configuration

Advanced Usage

Best Practices

Overview

Collation Methods

Execution and Output

Configuration

Advanced Usage

Best Practices

Overview

How It Works

Configuration

Advanced Usage

Best Practices

Overview

How It Works

Configuration

Advanced Usage

Sample Regular Expressions

Best Practices

Overview

How It Works

Configuration

Advanced Usage

Best Practices

Overview

How It Works

Configuration

Constraints

Advanced Usage

Best Practices

Overview

How It Works

Configuration

Advanced Usage

Best Practices

Common Deduplication Scenarios

Usage

Configuration and Usage

Typical Scenarios

Related Types

Overview

Key Scenarios

Processing Flow

Usage Guidance

Overview

How It Works

Overview

How It Works

Usage

Design Tabs

Context Menu Commands

Child Types

See Also

Used By