Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

IdfModes

Grooper.Core

Specifies how the inverse document frequency (IDF) of features should be calculated for classification models.

Remarks

The 'IdfModes' enumeration controls the method used to compute inverse document frequency (IDF) for features during document classification. IDF measures how unique or common a feature is across all Document Types, and is a key component in weighting features for similarity calculations.

Overview

  • Inverse Document Frequency (IDF):
    Reduces the weight of features that are common across many Document Types, and increases the weight of features that are rare or distinctive.
  • The selected mode determines whether standard or smoothed IDF is used, which can affect the handling of rare or ubiquitous features.

Available Modes

  • Normal:
    Uses the standard IDF calculation, which may assign very high weights to features that appear in only one Document Type.
  • Smooth:
    Adds smoothing to the IDF calculation, preventing extreme weights for rare features and improving stability when the number of Document Types is small.

Practical Guidance

  • Use 'Normal' for most scenarios where the number of Document Types is moderate to large and rare features should be highly weighted.
  • Use 'Smooth' when you want to avoid extreme weighting for features that appear in only one or very few Document Types, or when working with small sets of types.

For more information, see the documentation for Lexical, Document Type, and 'Document Frequency Mode'.

Can be one of the following values:

NameValueDescription

Used By

Notification