Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

HTML Document

HTML Document Base Grooper.Messaging

Represents an HTML file attachment type for Batch Folders, enabling HTML-centric processing, parsing, and integration in Grooper.

Remarks

The HTMLDocument class provides support for HTML files (.htm, .html, text/html) within Batch Folders. It enables Grooper to recognize, process, and manipulate HTML files as attachments, supporting downstream data extraction, conversion, and export workflows.

Overview

  • HTMLDocument is automatically selected for files with the .htm or .html extension or text/html MIME type.
  • Integrates with Batch Folders to enable document-centric workflows, including classification, extraction, and export.
  • Parses HTML content using the HtmlAgilityPack library for robust HTML handling.

HTML File Handling

  • Loads and parses HTML files as attachments to Batch Folders.
  • Provides access to the parsed HTML DOM for downstream processing and extraction.
  • Handles malformed or non-standard HTML gracefully.

Available Commands

  • ConvertToText: Converts the HTML document to either plain text or markdown format.
    • Set UseMarkdown to true to produce a markdown (.md) file, or false for plain text (.txt).
    • The converted content replaces the current attachment, and the file extension and MIME type are updated accordingly.
  • ConditionHTML: Applies attribute and structure normalization rules to the HTML document.
    • Use to standardize, tag, or clean up HTML for downstream extraction, analytics, or rendering scenarios.
    • Configuration options allow specifying attribute rules, element selectors, and normalization behaviors.
  • ConvertToPdf: Converts the HTML document to a PDF file.
    • Preserves layout and structure for archival or distribution.
    • The resulting PDF replaces or is added as an attachment to the Batch Folder.

Usage Notes

  • Use HTMLDocument to enable HTML-aware processing in Grooper, including conversion, extraction, and export.
  • Combine with Content Type and Data Model configuration to extract structured data from HTML files.
  • Suitable for processing emails, web pages, and other HTML-based content in batch workflows.

For more information, see the documentation for Batch Folder, Content Type, and data extraction from HTML in Grooper.

Properties

NameTypeDescription
Error Message String

Error message.

Context Menu Commands

Command Shortcut Description
code_blocks Condition HTML Performs cleanup and normalization of HTML documents.
picture_as_pdf Convert to PDF Converts an HTML document to PDF format.
code_off Convert To Text Converts the HTML document to plain text or markdown.
Notification