Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

AI Column Extractor

Ask AI Grooper.GPT.OpenAI.Extractors

Extracts structured content from documents with two-column layouts.

Remarks

Overview

The AIColumnExtractor is designed to extract structured data from documents with a two-column layout. It uses advanced AI-driven layout analysis and OCR results to identify and extract content from key regions of the page in order: header, left column, right column, and footer.

Key Features

  • Automatically detects and separates content into header, left column, right column, and footer regions.
  • Uses AI to determine layout boundaries based on visual structure.
  • Leverages OCR results to extract text from each region.
  • Provides diagnostic tools to visually confirm the extracted regions.

When to Use

  • Ideal for documents with a consistent two-column format, such as:
    • Academic transcripts
    • Reports
    • Structured forms
  • Not suitable for documents without a clear two-column layout, as the extractor may return no results.

Configuration Guidance

  • Ensure the document is scanned with good quality and follows a clear two-column structure.
  • Use the Instructions property to specify document-type-specific extraction goals.

Tips for Best Results

  • Preprocess the document to ensure clear formatting and alignment.
  • Use diagnostic tools to verify the accuracy of bounding rectangles and extracted data.

Diagnostics

  • The extractor supports diagnostic logging to help users understand and troubleshoot the extraction process.
  • When diagnostics are enabled, the following artifacts may be generated:
    • Bounding Rectangles: Visual annotations of the extracted regions.
    • Chat Log: A file named Chat Log.jsonl that captures the conversation with the AI model.
    • Schema File: A file named JSON Schema.json containing the schema provided to the AI model.

Properties

NameTypeDescription
General
Prompt
Response

See Also

Used By

Notification