Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Label Set

Embedded Object Grooper.Extract

Defines the set of labels used to identify Data Elements on a Document Type.

Remarks

Overview

A Label Set is a mapping between Data Elements and their corresponding text labels on a specific Document Type. It is stored on a Document Type, and contains the set of labels used on the document type to identify fields, columns, headers, footers, and more.


Label Set-Based Approach

Label Set-based data extraction is a strategy used for classification and data extraction on semi-structured document sets. The term "semi-structured" refers to document sets where the documents largely contain the same data, but in different layouts and using different labels.

The basic concept is to build a data model where Data Fields, Data Sections, and Data Tables use extraction methods that are "Label Set-aware". Label Set-aware extractors are designed to work with Label Sets and contain all the logic necessary to extract data, provided that a Label Set has been created for the Document Type being processed.

In a Label Set-based approach, Label Sets are also used to classify documents. This is done by configuring the Content Model to use Labelset-Based classification.


Onboarding New Document Types

Once the base Content Model and Data Model are configured, the only work required to onboard a new document type is the creation of a Label Set. In many cases, this takes only a few minutes, and a single designer can create hundreds of templates in a week.

Label Sets are edited on the Design Page using the Content Type - Labels tab. This tab is visible for any Content Type which defines or inherits a Labeling Behavior. The process of creating a label set involves defining one or more labels for each Data Element in the Data Model—primarily by selecting labels from a sample document.


Handling Outliers

When using a Label Set-based approach, there will always be cases where this approach does not work for a particular Data Element. For example, if a patient name is given on a document with no label at all to identify it, this is called unlabeled data. To resolve a case like this, override the Value Extractor for the field and use an extract method that isn't based on text labels.


Support for Label Sets

Label Sets are consumed by Labelset-Based classification and a variety of Label Set-aware extractors:

Element Type Extractor Description
Field Labeled Value Reads a labeled field value.
Field Labeled OMR Reads a labeled checkbox or checkbox group.
Table Tabular Layout Reads a table using header, column, and footer labels.
Table Row Match Uses table header and footer labels to delimit table content.
Table Fluid Layout Conditionally executes Row Match or Tabular Layout.
Section Transaction Detection Uses labels to identify repeating sections.

Best Practices and Usage Guidance

  • Use Label Sets to enable rapid onboarding of new document types by mapping document text to Data Elements.
  • Ensure each Document Type has a well-defined Label Set for optimal extraction and classification results.
  • For documents with inconsistent or missing labels, supplement Label Set-based extraction with custom extractors.
  • Regularly review and update Label Sets as document layouts or business requirements evolve.

Related Concepts

Premium Feature

Label Set is premium feature which requires separate licensing. To view your current licensing entitlements, select the Root on the Design page, then select the Licensing tab. Contact your BIS account representative to inquire about premium features.

Notification