Grooper Help - Version 25.0
25.0.0024 2,166

HTTP Import

Import Provider Grooper.Messaging

Imports content from HTTP servers, allowing ingestion of individual web pages or entire web sites.

Remarks

Use HTTP Import to bring web-based content into Grooper, either by importing specific pages or by crawling entire sites.

How it works

  • Sources: Define one or more HTTPResource objects, each representing a starting URL (site or page) to import.
  • Each HTTPResource can specify:
    • The root URL or specific pages to include.
    • A set of HyperlinkSelector objects to control which links are followed and how deep the crawl goes.
    • Scope enforcement and inclusion/exclusion rules for URLs.
  • WaitTime: Optionally set a delay between fetches to avoid overloading servers.

Typical Use Cases

  • Importing a set of web pages for archival or processing.
  • Crawling a website to ingest all or part of its content, following only certain links or patterns.

Notes

  • Use the configuration options in HTTPResource and HyperlinkSelector to control which content is imported and how links are followed.
  • Supports both one-time imports and scheduled/batch operations.

Properties

NameTypeDescription
General
Processing Options

See Also

Used By

Notification