Grooper Help - Version 25.0
25.0.0024 2,166

Field Class

Extractor Node Grooper.Extract

A trainable binary classifier for locating specific information on a document using contextual features.

Remarks

The Field Class is a supervised machine learning object in Grooper, designed to identify the correct instance of a value among multiple candidates on a document by analyzing the surrounding context. Field Classes are ideal for documents where the same type of value (such as a date, name, or clause) may appear in multiple places, and the correct instance must be selected based on nearby words, phrases, or other features.

Overview

  • Field Classes are one of the three primary data extraction objects in Grooper, alongside Value Readers and Data Types.
  • They are created as children of a Project or the "Local Resources" folder of a Content Type.
  • Field Classes do not extract values on their own; they must be referenced by another object to participate in extraction.

Configuration

  • Field Classes use two Value Extractors:
    • The 'Value Extractor' finds all possible candidate values (e.g., all dates on a page).
    • The 'Feature Extractor' finds contextual features (e.g., words or phrases) near each candidate.
  • Additional properties control the context scope, feature zones, and classifier tuning.
  • The context can be defined by flow (text order), zones (spatial regions), or proximity (nearest features).

Training & Usage

  • After configuring extractors, use the Extractor Node - Tester tab to run extraction and view candidates and features.
  • Select the correct value in the results list and use the thumbs-up (positive) or thumbs-down (negative) buttons to train the classifier.
  • The classifier uses TF-IDF weighting to learn which features are most predictive of the correct value.
  • On future documents, the Field Class will score candidates based on how closely their context matches the training data.

Best Practices

  • Use Field Classes when the correct value cannot be determined by position alone, such as in contracts, legal documents, or unstructured text.
  • Provide diverse positive and negative training examples to improve accuracy.
  • Adjust context scope and feature extraction settings to best capture the distinguishing context for your use case.

Properties

NameTypeDescription
General
Context Scope Options
Classifier Tuning
Output

Design Tabs

General View or edit properties of a node.
Reports View reports for a node.
Tester Test an Extractor Node on documents in a test batch.
Weightings View the classification weightings associated with this Field Class.
Advanced View or edit advanced details about a node.

Context Menu Commands

Command Shortcut Description
bolt Purge Training Deletes all training data from this Field Class.

Child Types

See Also

Used By

Notification