Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Detect Language (Legacy)

Code Activity Grooper.Activities

Detects the dominant language of each document or page, and marks it with the appropriate ISO culture code.

Remarks

This activity should run after Recognize and before Extract. When processing multi-language documents, many data extraction and classification techniques require correct detection of the document's primary language in order to work properly.

While some OCR Engines detect the language of the document, they do so with vary degrees of accuracy, and some don't detect the language at all. When importing electronic document, there is no OCR engine to detect the language for us. Most multi-language document processing scenarios will require this activity.

.

This activity works by extracting all words from a document, and then cross-referencing a multi-language Lexicon to compute the percentage of words on the document which are valid for each language. The highest-scoring language wins, so long as the score was higher than the configured minimum.

Properties

NameTypeDescription
General
Processing Options

See Also

Used By

Notification