Grooper Help - Version 25.0
25.0.0023 2,165
  • Overview
  • Help Status

AI Column Extractor

Value Extractor Grooper.GPT.OpenAI.Extractors

Extracts structured content from documents with two-column layouts.

Remarks

Overview

The AI Column Extractor is designed to extract structured data from documents that use a two-column layout. It leverages AI-driven layout analysis and OCR results to identify and extract content from key regions of each page, including the header, left column, right column, and footer.

Key Features

  • Automatically detects and separates content into header, left column, right column, and footer regions.
  • Uses AI to determine layout boundaries based on visual structure and spatial patterns.
  • Integrates OCR results to extract text from each region.
  • Provides diagnostic tools to visually confirm extracted regions and troubleshoot extraction issues.

When to Use

  • Ideal for documents with a consistent two-column format, such as academic transcripts, reports, or structured forms.
  • Not suitable for documents without a clear two-column layout, as the extractor may return no results.

Configuration Guidance

  • Ensure documents are scanned with good quality and follow a clear two-column structure.
  • Use the CustomInstructions property to provide document-type-specific extraction goals, such as identifying unique header, footer, or column content.

Diagnostics

  • Diagnostic logging is supported to help users understand and troubleshoot the extraction process.
  • When enabled, the extractor can generate visual annotations of bounding rectangles, a chat log of the AI interaction, and a schema file describing the expected layout.

Tips for Best Results

  • Preprocess documents for clear formatting and alignment.
  • Use diagnostic tools to verify the accuracy of extracted regions and data.

Properties

NameTypeDescription

See Also

Used By

Notification