Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Recognize

Code Activity Grooper.Activities

Recognizes the internal content of a document or page, and saves the resulting data for use by subsequent activities.

Remarks

This activity detects and reads the presentation elements which convey information in a document, such as text segments, barcodes, lines, checkboxes, and other shapes. The resulting character data and layout information is saved on the Batch Folder or Batch Page object being processed, where it is available to subsequent activities which depend on recognition results.

Recognize should be executed in a Batch Process after any permanent image cleanup has been applied, and before any activities which depend on recognition output. (See further discussion below.) It can be executed at the page level or at the folder level.

Page-Level Operation

When executed at the page level, recognizes one page at a time. If the input page has PDF content (i.e. was created by splitting a PDF document), then the page will be handled as a PDF page for the purposes of native text extraction. Otherwise, the page will be processed as an image, and text extraction will be purely OCR-based.

Page-level processing is the preferred operating mode in many cases, as it maximizes the benefits from parallel processing.

Folder-Level Operation

When executed at the folder level, recognizes all pages of a multi-page document at once. The input folder must have a PDF or image-based document attached. To recognize other document formats such as Microsoft Word, HTML, and etc., use the Render activity to generate a PDF version prior to executing this activity.

Due to the CPU-intensive nature of recognition, folder-level processing may be unsuitable for large documents. For example, a single task running recognition on a 1000-page document could take 20 minutes to complete. This can result in long-running tasks which appear hung and services which are difficult to start and stop. In such cases, split the document into pages (see Split Pages) and run recognition at the page level.

Activities Depending on Recognize

Any activity which accesses the internal content of a document will require recognition results in order to function properly. The following are specific examples of cases where other activities depend on the output from Recognize:

Properties

NameTypeDescription
General
PDF Options
Processing Options

See Also

Used By

Notification