Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Line Removal

Line Detection Grooper.IP

Removes horizontal and vertical lines from an image, typically to clean up an image prior to OCR=.

Remarks

The Line Removal command detects and removes straight lines from document images, such as those found in forms, tables, or pre-printed backgrounds. Removing these lines can improve OCR accuracy and make extracted data cleaner by eliminating visual noise that may interfere with text recognition or downstream processing.

How Line Removal Works

  1. The input image is binarized using the settings defined in the 'Binarization Settings' property. Binarization separates lines from the background and is critical for accurate detection.
  2. Optional font dropout and preprocessing steps are applied to remove text and enhance line visibility, reducing the risk of text being mistaken for lines.
  3. The system scans for horizontal and vertical runs of black pixels, applying configurable thresholds for length, thickness, aspect ratio, and fill. Advanced features such as comb detection, speck removal, and dash sequence handling are used to address a wide range of real-world document layouts.
  4. Detected lines are further analyzed for gaps, noise, and connectivity. The command supports detection of dashed, broken, or skewed lines using advanced Hough transform-based analysis if enabled.
  5. A mask is generated to cover the detected lines. The mask may also include regions around lines (trim distance) to ensure clean removal, especially where lines intersect with text or marks.
  6. The mask is applied to the original image using the selected dropout method (e.g., white fill, background fill, interpolation), removing the lines and preserving the rest of the content.
  7. The cleaned image is output, with lines removed and the rest of the content preserved. The command also outputs Layout Data describing the position and orientation of detected lines, which can be used for downstream processing.

Configuration and Usage

  • Use the 'Dropout Method' property to control how masked regions are filled (e.g., with white, background color, or interpolation).
  • For advanced control, configure line detection properties in the base Line Detection class, such as minimum line length, thickness, aspect ratio, fill percentage, gap tolerance, and comb removal.
  • Enable advanced line detection (Hough transform) for documents with faint, broken, or skewed lines.
  • Adjust font dropout and preprocessing settings to optimize line visibility and minimize interference from text or graphics.
  • Review diagnostic images and logs to ensure that only the intended lines are being removed and that important content is preserved.

Supported Pixel Formats

All common pixel formats are supported, including Pixel8bppGrayscale, Pixel24bppBgr, and Pixel1bppIndexed. Images are automatically converted as needed for line detection and removal.

Diagnostics

When run in diagnostic mode, Line Removal generates a comprehensive set of diagnostic outputs to assist with configuration and troubleshooting:

  • Binarized: Shows the effect of thresholding and preprocessing, helping you verify that lines are clearly separated from the background.
  • Preprocessed: Illustrates the result of font dropout and other enhancements, showing how text and noise are removed before line detection.
  • Dropout Mask: Visualizes the regions of the image that will be removed, including detected lines and any trim distance applied around them.
  • Trim Mask: (if applicable) Shows the effect of the trim distance setting, helping you balance thorough line removal with preservation of intersecting features.
  • Log Messages: Report the number and type of lines detected, as well as timing and processing details for each step.

Use these diagnostics to fine-tune detection parameters, validate that only the intended lines are being removed, and ensure that important content is preserved. Always review diagnostic output to avoid unintentional data loss.

Notes

  • Overly broad settings may remove important content, such as underlines or table borders needed for data extraction.
  • Always review diagnostic output to avoid unintentional data loss.
  • Line Removal does not generate classification features directly, but the results can impact downstream extraction and analysis.
  • Start with default settings and iteratively adjust parameters, reviewing diagnostic images at each step for best results on your specific document set.

Properties

NameTypeDescription
General
Comb Removal
Image Preprocessing
Command Info

See Also

Used By

Notification