Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Vertical Tab Marker

Embedded Object Grooper.Core

Detects and marks large vertical whitespace gaps between lines with a vertical tab character to represent vertical separation in text.

Remarks

The Vertical Tab Marker is a text preprocessing component that identifies significant vertical gaps between lines in a document and replaces the standard line break (CR/LF) with a vertical tab character (\v). This is useful for representing vertical structure—such as section breaks, table row separation, or logical grouping—within extracted text.

Purpose

In many documents, a large vertical gap between lines indicates a new section, a table row, or a logical break. Standard line breaks do not distinguish between normal line wrapping and these larger separations, making it difficult for downstream extractors to interpret the document's structure.

The Vertical Tab Marker solves this by converting line breaks to vertical tab characters when the vertical gap between two lines exceeds the configured threshold. This allows extractors and parsers to recognize and handle vertical structure more accurately.

How It Works

  • The text is split into lines.
  • For each pair of adjacent lines, the vertical distance between the bottom of the previous line and the top of the current line is measured.
  • If this gap is greater than or equal to the 'Vertical Gap Threshold', the line break is replaced with a vertical tab character (\v).
  • Otherwise, the standard line break is preserved.

This approach enables downstream extraction logic to distinguish between normal line wrapping and significant vertical separations, improving the accuracy of data extraction from structured documents.

Configuration Guidance

  • Set the 'Vertical Gap Threshold' property to control the minimum vertical distance (in inches, centimeters, or points) that qualifies for vertical tab insertion.
  • Adjust this value to match the typical spacing used for section breaks or table rows in your documents.

Usage Notes

  • Vertical Tab Marker is typically used as part of a text preprocessing pipeline before data extraction.
  • Proper configuration is essential for accurate detection of vertical structure, especially in documents with variable line spacing.
  • For more information on related concepts, see Data Instance and Document Instance.

Properties

NameTypeDescription

Used By

Notification