Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Document Type

Content Type Grooper.Core

Represents a distinct type of document within a Content Model, such as an invoice, contract, or letter.

Remarks

Overview

A Document Type is a child of a Content Model or Content Category, and serves as the primary configuration point for describing a specific class of document. Document Type objects define the classification, separation, and data extraction rules for documents of a specific class, enabling automated processing and organization in Grooper. Each Document Type can define its own Data Model, classification logic, and separation rules, while inheriting shared settings and data elements from its parent objects.

Classification and Assignment

Documents are assigned a Document Type through a process called classification. This can be performed automatically using the Classify or Separate activities, or manually by a user via the Batch Folder - Assign Document Type command. Classification determines which extraction rules and data elements apply to each document.

  • Automated classification uses extractors and scoring logic to select the best-matching Document Type.
  • Manual assignment allows users to override or specify the type directly.

Classification Training

If the parent Content Model uses a training-based classification method, Document Types can be trained using sample documents. Training is performed via the Classification Tester or by using the Train As and Train From commands. During training, child objects such as Form Type, Page Type, and Training Page are created to store the learned features for each Document Type.

  • Training enables the system to recognize complex or variable document layouts.
  • The 'Allow Training' property controls whether a Document Type can be trained.

Data Modeling and Inheritance

Each Document Type may define a local Data Model to specify the data elements (fields, sections, tables) to extract from documents of that type. In addition, all data elements defined on parent objects in the Content Model hierarchy are inherited, allowing for shared configuration and easy reuse.

  • The total set of data elements includes both inherited and local definitions.
  • Property overrides can be used to customize inherited data elements for a specific Document Type.

Separation and Pagination

Document Types control how documents are separated during batch processing, using the 'Pagination' property and related settings. Options include structured, unstructured, fixed-length, and extended pagination, each supporting different document scenarios.

  • Additional properties such as 'Trigger On Any Page', 'Combine Contiguous', and 'Prioritize EPI' provide fine-grained control over separation behavior.

Usage Guidance

  • Use Document Types to model each distinct class of document in your solution.
  • Configure classification and separation settings to match the characteristics of your documents.
  • Define or inherit a Data Model to extract the required data elements.
  • Leverage training to improve classification accuracy for complex or variable documents.
  • Use property overrides to adapt shared data elements for specialized document types.

For more information, see the documentation for Content Model, Content Category, Data Model, Classify, and Separate activities.

Properties

NameTypeDescription
General
Classification
Separation
Appearance

Design Tabs

General View or edit properties of a node.
Documents View a list of documents which are classified this content type or one of its descendants.
Reports View reports for a node.
Training Samples View a list of training documents for this content type and its descendants.
Labels Edit Label Sets for this content type or its descendants.
Overrides Override Data Element property values for this content type.
Weightings View the classification weightings associated with this Content Type.
Advanced View or edit advanced details about a node.

Child Types

See Also

Used By

Notification