Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Extract Features

IP Command Grooper.IP

Divides an image into an NxN matrix of cells and generates a vector of N*N features.

Remarks

The Extract Features command is the primary feature generator for Visual classification in Grooper. It divides an image into a grid of cells and computes a vector of features for each cell, enabling rapid and accurate document classification without requiring full text OCR. This approach is highly efficient, hardware-accelerated, and can classify pages in real time (often in less than 100 ms).

The command supports a variety of feature types, including intensity, gradient angle, center of gravity (COG) angle and magnitude, entropy, and color space channel averages. By adjusting the number of cells and the region of interest, you can tune the granularity and focus of the feature extraction to match your document types.

How It Works

  1. The image is optionally cropped to a region of interest (ROI).
  2. The image is optionally filtered to emphasize major elements.
  3. The ROI is divided into a grid of cells (CellsX by CellsY).
  4. For each cell, the selected feature types are computed.
  5. The resulting feature vector is used for classification, clustering, or other machine learning tasks.

Supported Pixel Formats

  • All common pixel formats are supported as input. Images are automatically converted as needed for feature extraction.

Diagnostics

When diagnostic mode is enabled, Extract Features generates diagnostic images for each feature type, as well as a text file listing all computed features. This helps you visualize how features are extracted and tune your configuration.

Feature Types

  • Intensity: Average brightness per cell.
  • Gradient Angle: Dominant edge direction per cell.
  • COG Angle/Magnitude: Direction and distance from cell center to center of gravity of dark pixels.
  • Entropy: Local complexity or "busyness" per cell.
  • Color Space Channel: Average value for a selected color channel.

Configuration Guidance

  • Use a higher number of cells for more detailed classification, or fewer cells for faster processing.
  • Restrict the ROI to focus on stable regions (e.g., headers on invoices).
  • Enable filtering to emphasize document structure and suppress noise.
  • Select feature types based on the characteristics of your documents and classification goals.

Properties

NameTypeDescription
General
Image Filtering
Command Info

See Also

Used By

Notification