Grooper Help - Version 25.0
25.0.0040 2,257

DI Analyze

Code Activity Grooper.Cloud

Analyzes document pages using Azure Document Intelligence to extract text, layout, style, and semantic elements.

Remarks

The DI Analyze activity leverages Azure AI Document Intelligence to recognize text, layout, style, and semantic elements on pages of a document and generate data describing the structure and content. The generated data is saved for use in downstream OCR and data extraction steps.

Role and Usage

DI Analyze is configured as a step in a Batch Process to automate document analysis. It submits pages or attachments to Azure Document Intelligence, retrieves structured results, and stores them for further processing. Users can select the Azure model, features, and content format to match their document types and extraction needs.

  • Supports both page-level and folder-level analysis, including attachments (PDF, TIFF, JPEG).
  • Results are saved as JSON files per page, enabling review and integration with Grooper's extraction pipeline.
  • Orientation correction can be enabled to automatically rotate pages based on detected layout, improving accuracy.

Configuration Guidance

  • Choose the Azure model and features appropriate for your documents (e.g., prebuilt-layout for general use).
  • Set the content format to match your document set and extraction requirements.
  • Enable 'Correct Orientation' to adjust page rotation based on layout analysis.
  • Use 'Overwrite' to control whether previous results are replaced.
  • Prefer attachments for documents where the original file is the best source for extraction.

Diagnostics

Diagnostic artifacts generated by this activity include:

  • JSON result files for each analyzed page.
  • Markdown files containing extracted content.
  • HTML files for visual review.
  • Diagnostic images for lines, words, and paragraphs.

These artifacts support troubleshooting, review, and validation of extraction results.

Properties

NameTypeDescription
Parameters
Options
Processing Options

Used By

Notification