Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Labelset-Based

Classify Method Grooper.Extract

Classifies documents using Label Sets to determine the best matching document type.

Remarks

The Labelset Based classification method analyzes the input document by extracting and scoring label matches for all Document Types in scope. It selects the best match based on label presence and scoring, enabling highly accurate classification for semi-structured documents where field labels are consistent but layouts may vary.

This method requires that a Labeling Behavior is configured on the parent Content Model or Content Category, and that each Document Type defines a Label Set describing its expected field labels.

How It Works

  • For each Document Type in scope, the method loads its Label Set and attempts to match its labels against the document's text.
  • The best-matching Document Type is selected as the classification result, based on label coverage and scoring.
  • Classification rules defined on Document Types (such as positive/negative extractors and page count constraints) are enforced and can supplement labelset-based logic.

Usage Guidance

  • Use this method for document sets where field labels are reliable indicators of type, even if layouts differ.
  • Ensure each Document Type has a well-defined Label Set for optimal results.
  • Adjust the 'Prescan Threshold' property to optimize performance for large or complex document sets.
  • For outlier cases (such as unlabeled data), override extraction logic at the field level as needed.

For more information, see the documentation for Label Sets, Labeling Behavior, and Document Types.

Properties

NameTypeDescription

Used By

Notification