Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

FRX Options

Embedded Object Grooper.Core

Specifies fuzzy matching options for a regular expression.

Remarks

Unlike a normal regular expression, which finds values exactly matching the pattern, a fuzzy regular expression (FRX) finds values which match the pattern to a specific degree of similarity, and automatically repairs the output value whenever possible.

When using FRX mode, there are a few limitations on regular expression syntax and some performance implications which need to be considered. These are outlined below.

Regular Expression Syntax

Fuzzy regular expressions support most of the syntax and features of standard regular expressions, with a handful of exceptions noted below. The following regular expression features are NOT supported in FuzzyRegEx mode:

FRX also supports an option which is unavailable in normal regular expressions. (?r) will turn on required mode, and (?-r) will turn it off. At the start of an FRX, required mode always defaults to off. Once turned on, required mode will stay on until it is turned off. This mechanism can be used, for example, to require the start of a new line. The syntax to accomplish this would be be (?r)\n(?-r).

Performance Considerations

The processing time for an FRX is considerably longer than a normal regular expression, particularly for complex regular expressions. The execution time is proportional to the perplexity of the regular expression - which measures the number of possible permutations in the pattern. For example:

  • A{1,2}B{1,2} has a perplexity of 2 * 2 = 4 (i.e. it could match AB, AAB, ABB, or AABB).
  • A{1,2}B{1,2}C{1,2} has a perplexity of 2 * 2 * 2 = 8.
  • A{1,5}B{1,5}C{1,5} has a perplexity of 5 * 5 * 5 = 125.
  • [0-9]{4} (miles|kilometers) has a perplexity of 1 * 2 = 2.
  • [0-9]{1,5} (miles|kilometers) has a perplexity of 5 * 2 = 10.

There is a point at which perplexity gets so high that fuzzy matching is computationally impractical. As such, FRX is not suitable for every extraction task, and should be used with caution.

Properties

NameTypeDescription

See Also

Used By

Notification