Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Fuzzy Lookup Options

Embedded Object Grooper.Extract

Configures fuzzy matching options for value lookups, enabling correction of minor errors and flexible matching against vocabularies.

Remarks

Overview

The Fuzzy Lookup Options class defines how fuzzy matching is performed when a value is not found by exact match in the vocabulary during a Value Lookup operation. Fuzzy lookup is a secondary search that can automatically resolve minor differences caused by OCR inaccuracy, spelling mistakes, or inflected word forms, improving data quality and extraction robustness.

Fuzzy matching uses edit distance algorithms (such as Levenshtein distance) to compare the input value to entries in the vocabulary, scoring each candidate by similarity. The closest match above the configured similarity threshold is selected, optionally using custom weightings to account for common OCR or typographical errors.

How Fuzzy Lookup Works

  1. If a value is not found in the vocabulary by exact match, fuzzy lookup is attempted (if enabled).
  2. Each candidate in the vocabulary is compared to the input value using edit distance.
  3. The best match above the 'MinimumSimilarity' threshold is selected as the replacement.
  4. Custom weightings (via the 'Weightings' property) can adjust the cost of specific character substitutions, insertions, or deletions.
  5. Optionally, fuzzy matching can be limited to the top N entries or to an alternate vocabulary for performance.

Usage Scenarios

  • Correcting common OCR errors (e.g., "1NVOICE" → "INVOICE", "O" ↔ "0", "B" ↔ "8").
  • Handling misspellings or inflected forms in user-entered or scanned data.
  • Improving extraction accuracy in noisy or low-quality documents.

Best Practices

  • Test fuzzy lookup settings with representative data to ensure desired correction and avoid over-matching.
  • Regularly review and update weightings and alternate vocabularies as data quality or requirements change.
  • Use in combination with exclusions and translation options in Value Lookup for comprehensive value normalization.

Properties

NameTypeDescription

See Also

Used By

Notification