Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

HTTP Import

Import Provider Grooper.Messaging

Imports content from HTTP servers, allowing ingestion of individual web pages or entire web sites.

Remarks

Use HTTP Import to bring web-based content into Grooper, either by importing specific pages or by crawling entire sites.

How it works

  • Sources: Define one or more HTTPResource objects, each representing a starting URL (site or page) to import.
  • Each HTTPResource can specify:
    • The root URL or specific pages to include.
    • A set of HyperlinkSelector objects to control which links are followed and how deep the crawl goes.
    • Scope enforcement and inclusion/exclusion rules for URLs.
  • WaitTime: Optionally set a delay between fetches to avoid overloading servers.

Typical Use Cases

  • Importing a set of web pages for archival or processing.
  • Crawling a website to ingest all or part of its content, following only certain links or patterns.

Notes

  • Use the configuration options in HTTPResource and HyperlinkSelector to control which content is imported and how links are followed.
  • Supports both one-time imports and scheduled/batch operations.

Properties

NameTypeDescription
General
Processing Options

See Also

Used By

Notification