Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

TextExtractMode

Grooper.Activities

Specifies how text should be extracted from PDF files.

Remarks

When enabled, this mode reads native PDF text segments directly, rather than through OCR. In applicable use cases, this mechanism delivers 100% accurate character extraction, avoiding the uncertainty of OCR.

Note that the text extraction process operates against all text objects drawn on the page - whether they are actually visible or not. Unexpected results can occur when the input document contains text drawn transparently or behind other objects.

This setting is only applicable if the input is PDF content. When running at the folder level, this means that the input document must be a native PDF document, or must have a PDF version generated by the Render activity. When running at the page level, this means that the page object must have been created by splitting a PDF document.

Can be one of the following values:

NameValueDescription
Full0Native text segments, annotations, and form fields will be extracted.
Simple1Only native text segments will be extracted.
None2No effort will be made to read native text segments. PDF pages will be treated as images and processed through OCR.

Used By

Notification