Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Result Set Options

Embedded Object Grooper.Extract

Configures post-processing options for a set of extracted results, enabling value normalization, confidence adjustment, sorting, filtering, and other result set controls.

Remarks

The Result Set Options class provides a flexible set of controls for shaping the output of data extraction and classification activities in Grooper. It allows you to define how individual results are adjusted, how the overall result set is filtered or ordered, and how output values are normalized for downstream use.

Overview

Use this class to:

  • Normalize extracted values (e.g., unify variations, apply case or word transformations).
  • Adjust or override confidence scores for results.
  • Remove unwanted content from output values.
  • Sort, filter, or deduplicate the result set.
  • Enforce minimum/maximum hit counts or truncate the number of results.

These options are commonly configured on Data Fields, Data Types, or other extraction elements to ensure that the output meets business requirements and is ready for validation, export, or further processing.

Key Scenarios

  • Value Normalization:
    Use 'Value Override', 'Case Normalization', or 'Word Normalization' to standardize output values, making them consistent for classification or export. For example, you can map multiple document title variations to a single normalized feature, or apply stemming to treat inflected word forms as equivalent.

  • Confidence Adjustment:
    The 'Confidence Override' and 'Confidence Multiplier' properties allow you to boost or reduce the confidence of results, influencing their precedence in classification or extraction. You can also factor in character-level OCR confidence for more granular control.

  • Result Set Shaping:
    Use 'Hit Count Range' to require a specific number of results, 'Distinct' to remove duplicates, 'Sort Order' to control result ordering, and 'Truncate At' to limit the number of returned results.

  • Content Removal:
    The 'Subtractor' property lets you specify a Value Extractor to remove unwanted content (such as extraneous text or formatting) from each output value.

Processing Flow

When applied, the options in this class are processed in a defined sequence:

  1. Individual Value Adjustments:
    Each result is normalized, cleaned, or transformed according to the configured options.
  2. Result Set Operations:
    The set of results is filtered, sorted, deduplicated, or truncated as specified.
  3. Type Enforcement:
    If a specific value type is required, results are converted and validated accordingly.

This ensures that the final output is both clean and conforms to the requirements of downstream consumers.

Usage Guidance

  • Configure only the options relevant to your scenario; defaults are designed to minimize unnecessary processing.
  • For best results, test your configuration with representative data to ensure that normalization, filtering, and confidence adjustments behave as expected.
  • For more details on individual options, see the documentation for each property.

Properties

NameTypeDescription
Output Value
Other Adjustments
Result Set

See Also

Used By

Notification