Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Lexicon - Normalize

Lexicon Command Grooper.Core

Normalizes character data in the lexicon to the character set of the configured language.

Remarks

The Normalize command scans all entries in the selected Lexicon and attempts to convert any characters that are not valid for the configured language into their appropriate equivalents (homoglyphs). This is especially useful for cleaning up imported data, resolving issues with visually similar but distinct Unicode characters, and ensuring consistent matching during extraction and validation.

When to Use

  • To correct entries that contain characters from the wrong script or language (e.g., Cyrillic vs. Latin homoglyphs).
  • When imported or user-entered data may contain mixed or invalid characters for the target language.
  • To improve extraction accuracy and prevent missed matches due to character set inconsistencies.

How It Works

  • The command examines each entry in the Lexicon for characters not valid in the configured language.
  • If an invalid character is found, the command attempts to convert it to a valid homoglyph.
  • If conversion is successful and a matching entry exists, frequency data is merged; otherwise, the entry is considered invalid.
  • Depending on the 'Save Invalid Entries' property, invalid entries are either deleted or saved to a new Lexicon for review.

> Note: This command is only enabled when the Lexicon has a language configured.

For more information about language normalization and homoglyph handling, see the documentation for Lexicon.

Properties

NameTypeDescription

See Also

Notification