Grooper Help - Version 25.0
25.0.0024 2,166

Data Instance

Embedded Object Grooper.Core

A Data Instance represents a fragment of text content within a document.

Remarks

The Data Instance class is the foundational data object for Grooper’s extraction, validation, and data modeling system.
Data Instances are used to represent any segment of text content, from a single character to the entire content of a document.
They serve as both the input and output for all ESP™ extraction operations, and are the format in which document metadata is stored by the Extract activity.

Role in Grooper

Data Instances are created during the extraction process, typically when raw OCR data is loaded for a Batch Folder.
This forms a Document Instance representing the entire document’s content. When extraction is performed, this root instance becomes the source from which all data elements at the root of the Data Model perform their extraction. Extracted results are saved as children of the Document Instance, forming a hierarchical structure that mirrors the document’s logical data model.

Data Instances are also used to store user-entered values, calculated results, and validated data. They are the primary mechanism for organizing, storing, and presenting extracted or user-supplied information throughout Grooper’s data processing pipeline.

Hierarchy and Structure

Each Data Instance may have child Data Instances, forming a tree that reflects the structure of the Data Model.
For example, a Document Instance may have children representing fields, sections, or tables, each of which may have further children for nested data elements. This hierarchy enables Grooper to represent complex, nested document schemas and to support advanced extraction, validation, and review scenarios.

Extraction and Storage

  • Data Instances are created automatically during extraction, but can also be created or modified by user actions or custom code.
  • Each instance stores its extracted value, location (if available), confidence score, and other metadata.
  • The 'Value' property contains the extracted or assigned text. The 'Location' property (if set) indicates the region on the page from which the value was obtained.
  • Data Instances track their position within the parent instance using the 'Span' property, supporting precise mapping between extracted data and source content.

Usage and Configuration

Data Instances are not typically created or configured directly by end users. Instead, they are managed by Grooper’s extraction engine and data model configuration. However, understanding their role is essential for advanced solution design, troubleshooting, and custom scripting scenarios.

  • When configuring a Data Model, each Data Field, Data Section, or Data Table will result in corresponding Data Instances during extraction.
  • Data Instances are visible in the Data Review UI, where users can view, edit, or validate extracted values.
  • Advanced users may interact with Data Instances via expressions, custom code, or API integrations to automate data manipulation or validation workflows.

Integration with Other Grooper Features

  • Data Instances are used throughout Grooper for data validation, export, reporting, and workflow automation.
  • They support advanced features such as confidence scoring, geometric mapping, and multi-page extraction.
  • Data Instances are the primary data structure for transferring extracted values between activities, exporting to external systems, or presenting data for user review.

Properties

NameTypeDescription
General
Document Reference

Derived Types

There are 10 implementations of Data Instance.

Checkbox Instance Represents an instance of an OMR checkbox.
Document Instance Represents the entire content of a document, and serves as the root of the Data Element Instance hierarchies generated by the Extract activity.
Field Instance Represents the value associated with a Data Field object.
Labeled Instance A Data Instance that represents a value associated with one or more labels or checkboxes.
Section Instance Represents the extracted content associated with a Data Section.
Section Instance Collection Represents the value of a multi-instance Data Section object.
Table Cell Instance Represents the value for a Data Column in a Table Row Instance.
Table Header Instance Represents the column or row headers of a Table Instance.
Table Instance Represents an instance of a Data Table object on a document.
Table Row Instance Represents a table row in a Table Instance.

See Also

Used By

Notification