Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Tesseract OCR

OCR Engine Grooper.OCR

Provides an OCR engine for Grooper using the open-source Tesseract library.

Remarks

Overview

The Tesseract OCR engine integrates the Tesseract OCR library into Grooper, enabling extraction of text and layout information from images using a widely adopted, open-source solution.

Tesseract OCR is suitable for a broad range of document types and supports many languages and special fonts. It is typically selected as the OCR engine within an OCR Profile, and can be configured to optimize recognition for specific use cases.

Features

  • Supports orientation detection and script analysis.
  • Allows fine-tuning of space detection and character segmentation.
  • Integrates with Grooper's progress and diagnostic reporting.

Best Practices

  • Install only the required language data files to optimize performance.
  • Use character whitelisting and junk filtering to improve accuracy on noisy or complex documents.
  • Test different segmentation modes for best results on varied layouts.
  • Refer to Tesseract's open-source documentation for advanced tuning and troubleshooting.

Licensing

Tesseract is free software, released under the Apache License v2.0.

Properties

NameTypeDescription
General
Character Segmentation
Junk Filtering

See Also

Used By

Notification