Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Lexicon

Node Grooper.Core

Represents a dictionary that stores a list of keys or key-value pairs for use in extraction, validation, and normalization throughout Grooper.

Remarks

Overview

A Lexicon is a reusable resource that stores lists of words, phrases, field values, translations, weightings, or other information. Lexicons are used by Data Fields, Data Tables, and other components to provide lookup lists, enforce valid values, support language normalization, and enable advanced extraction scenarios.

Lexicons can be composed of:

  • Local entries: Defined in a text file, one entry per line. The format depends on the selected 'Lexicon Type'.
  • Included Lexicons: References to other Lexicons, allowing you to build composite or multi-language dictionaries.
  • Database queries: Populated dynamically from a database using a configurable SQL query and Data Connection.

Usage

  • Assign a Lexicon to the 'List Values' property of a Data Field to restrict or suggest valid values.
  • Use Lexicons to drive extraction logic, such as matching labels, codes, or entity names in documents.
  • Configure multi-language Lexicons by including other Lexicons, each representing a different language, and specifying the 'Language' property.
  • Populate Lexicons from external systems using the 'Connection' and 'Query' properties.
  • Use the 'Abbreviations' property to automatically generate alternate forms and permutations of entries.

Best Practices

  • Organize Lexicons in a dedicated Project for reuse across multiple content types and processes.
  • Use descriptive names and documentation for each Lexicon to clarify its intended use and scope.
  • Avoid circular references when including other Lexicons.
  • For multi-language scenarios, create one Lexicon per language and a parent Lexicon with no language specified that includes the language-specific Lexicons.
  • Regularly review and update Lexicon content to ensure accuracy and completeness.

Properties

NameTypeDescription
General
Database Link

Design Tabs

General Edit the properties and/or contents of a Lexicon.
Reports View reports for a node.
Advanced View or edit advanced details about a node.

Context Menu Commands

Command Shortcut Description
two_pager Intersect Creates a new lexicon containing all entries which appear both in this lexicon and a reference lexicon.
book_2 Merge Training Merges all training files into the main content of this Lexicon.
translate Normalize Normalizes character data in the lexicon to the character set of the configured language.
two_pager Subtract Removes all entries from this lexicon which appear in a reference lexicon.
book_2 Truncate Truncates the lexicon to the top N entries.

See Also

Used By

Notification