Grooper Help - Version 25.0
25.0.0024 2,166

Line Detection

IP Command Grooper.IP

Detects horizontal and vertical line structures in document images generating Layout Data for downstream processing.

Remarks

The Line Detection command is designed to identify, analyze, and optionally remove horizontal and vertical lines from scanned document images. This is essential for processing forms, tables, and other structured documents where lines may interfere with data extraction or need to be preserved as layout features.

The command works by preprocessing the input image (including binarization and optional font dropout), then applying a combination of morphological and geometric analysis to detect line segments. It supports advanced features such as comb detection, speck removal, and dash sequence handling to address a wide range of real-world document layouts.

How Line Detection Works

  1. The image is binarized using the configured method to clearly separate lines from the background and other features.
  2. Optional font dropout and preprocessing steps are applied to remove text and enhance line visibility.
  3. The system scans for horizontal and vertical runs of black pixels, applying configurable thresholds for length, thickness, aspect ratio, and fill.
  4. Detected lines are further analyzed for gaps, noise, and connectivity, with support for combs, specks, and dashed lines.
  5. The result includes collections of horizontal and vertical lines, which can be used for removal, layout analysis, or further processing.

Layout Data

The output of Line Detection includes Layout Data in the form of collections of horizontal and vertical lines. These describe the position and orientation of each detected line.

Typical uses of this Layout Data include:

  • Table extraction, by identifying cell boundaries and grid structures.
  • Form field alignment, by locating boxes, underlines, or separators.
  • Visual overlays, for reviewing detected lines in diagnostic or UI contexts.

Configuration and Tuning

  • Key properties such as 'Minimum Line Length', 'Maximum Line Thickness', 'Minimum Aspect Ratio', and 'Minimum Line Fill' allow you to tune detection for your document type.
  • Use comb detection and speck removal to handle forms with checkboxes, comb fields, or noisy backgrounds.
  • Adjust binarization and font dropout settings to optimize line visibility and minimize interference from text or graphics.
  • Review diagnostic images and logs to verify that lines are being detected and removed as intended, and adjust parameters as needed.

Supported Pixel Formats

  • The command supports all basic pixel formats: Pixel8bppGrayscale, Pixel24bppBgr, and Pixel1bppIndexed.
  • Images are automatically converted as needed for processing.

Diagnostics

When run in diagnostic mode, Line Detection generates a variety of outputs to assist with configuration and troubleshooting:

  • Binarized images showing the effect of thresholding and preprocessing.
  • Preprocessed images illustrating the result of font dropout and other enhancements.
  • Dropout masks and trim masks visualizing the regions affected by line removal and trimming.
  • Log messages reporting the number and type of lines detected, as well as timing and processing details.

Use these diagnostics to fine-tune detection parameters and ensure that only the intended lines are being detected or removed.

Classification Features

  • This command does not generate classification features. Its primary output is the set of detected line objects, which can be used by downstream layout analysis or extraction logic.

Practical Guidance

  • Start with default settings and review diagnostic images to assess detection quality.
  • For forms with dense tables or fine lines, lower the minimum line length and thickness thresholds.
  • For noisy or degraded images, enable speck removal and adjust the maximum speck size.
  • Use comb detection to handle checkboxes or comb fields, and tune quiet zone and fill parameters to avoid false positives.
  • Always validate results visually and iteratively adjust parameters for best performance on your specific document set.

Properties

NameTypeDescription
General
Comb Removal
Image Preprocessing
Command Info

Derived Types

There are 1 implementations of Line Detection.

Line Removal Removes horizontal and vertical lines from an image, typically to clean up an image prior to OCR=.

See Also

Used By

Notification