Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

TfModes

Grooper.Core

Specifies how the term frequency (TF) of features should be calculated for classification models.

Remarks

The 'TfModes' enumeration controls the method used to compute term frequency (TF) for features during document classification. The choice of mode affects how feature occurrences are weighted, which in turn influences similarity calculations between documents and trained Document Types.

Overview

  • Term Frequency (TF): Measures how often a feature (such as a word or token) appears in a document, relative to the document's size or other features.
  • The selected mode determines how TF is normalized or scaled, impacting sensitivity to document length, feature repetition, and feature prominence.

Available Modes

  • Normal:
    Normalizes feature counts by the total number of features in the document, making TF independent of document size.
  • Logarithmic:
    Applies a logarithmic scale to feature counts, reducing the impact of very frequent features. Provided for backward compatibility.
  • Augmented:
    Normalizes feature counts by the most frequent feature in the document, allowing document size to play a greater role in classification. Includes a dampering factor controlled by the 'Frequency Scaling' property.

Practical Guidance

  • Use 'Normal' for most scenarios where document length should not affect classification.
  • Use 'Logarithmic' for legacy models or when you want to reduce the influence of highly repetitive features.
  • Use 'Augmented' when you want longer documents or repeated features to have more influence, or when tuning with the 'Frequency Scaling' property.

Can be one of the following values:

NameValueDescription

Used By

Notification