Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Label Match

List Match Grooper.Extract

Matches a list of one or more label values, using matching options defined by a Labeling Behavior.

Remarks

The Label Match extractor is designed to identify field labels, headers, or entity names in document text by matching against a configurable list of terms. It is especially useful for scenarios where labels may appear in many spelling, formatting, or layout variations, and where consistent extraction logic is needed across multiple fields or document types.

How It Works

  • The extractor uses the 'Vocabulary' property to define the set of label terms to match. These can be entered directly or referenced from external lexicons.
  • Matching options such as fuzzy matching, vertical wrap, and constrained wrap are inherited from the associated Labeling Behavior, allowing centralized configuration and reuse.
  • When 'Translate' is enabled and 'Vocabulary' is configured as a lookup lexicon, matched values can be normalized or replaced with standardized forms.
  • The extractor supports detection of labels split across multiple lines (vertical wrap) or restricted to specific regions (constrained wrap), as configured in the Labeling Behavior.
  • Matching is case-sensitive by default and uses preprocessing to improve accuracy.

Configuration Guidance

  • Define all expected label variants in the 'Vocabulary' property, including alternate spellings, abbreviations, and formatting differences.
  • Use a Labeling Behavior to manage fuzzy matching, vertical wrap, and constrained wrap options centrally. This ensures consistent label extraction across all fields and extractors that reference the behavior.
  • Enable 'Translate' and configure a lookup lexicon to normalize matched labels to a single output value, improving consistency for downstream processing.
  • For documents with complex layouts, adjust vertical and constrained wrap settings to capture labels that span multiple lines or regions.

Usage Scenarios

  • Field Label Extraction:
    Extract field labels from forms, tables, or semi-structured documents, even when labels are wrapped across lines or appear with minor OCR errors.
  • Consistent Labeling Across Fields:
    Apply a single Labeling Behavior to multiple Label Match extractors to ensure consistent handling of fuzzy matching and wrapping options throughout a project.
  • Entity Name Normalization:
    Use translation to map multiple label variants (e.g., "International Business Machines", "IBM Corporation") to a single normalized value ("IBM").

Advanced Features

  • Fuzzy Matching:
    Tolerates minor OCR or typographical errors, increasing recall in noisy or variable documents.
  • Vertical and Constrained Wrapping:
    Detects labels split across lines or restricted to specific regions, improving extraction in complex layouts.
  • Case Sensitivity and Preprocessing:
    Ensures accurate matching by respecting case and applying text normalization before extraction.

Practical Tips

  • Regularly review and update the vocabulary to ensure all relevant label variants are included.
  • Test extraction with representative document samples to verify matching behavior and adjust settings as needed.
  • Use diagnostic logs to troubleshoot missed or incorrect matches, and refine vocabulary or behavior settings for optimal results.

Properties

NameTypeDescription
Matching
Options
Output

See Also

Used By

Notification