Grooper Help - Version 25.0
25.0.0026 2,185

VLM Analyze

Code Activity Grooper.GPT

Analyzes pages or folders using a Visual Language Model (VLM) and saves structured JSON analysis for use in downstream data extraction operations.

Remarks

Use VLM Analyze to run a multimodal VLM against a single page, a folder of pages, or an attached document. The activity sends image content and the configured JSON schema and instructions to the embedded Data Generator, receives structured JSON back, and saves that JSON to the processed item so other Grooper components (including quoting methods) can consume it.

How it works

  • VLMAnalyze prepares one or more image content streams (single page image, a merged PDF/TIFF of folder pages, or the attached file) and passes them, along with JSONSchema and Instructions, to the configured Data Generator. The generator returns a JSON object containing the VLM analysis results.
  • The result is written to the processed item using the name returned by the FileName property (by default VLMAnalyze_{Model}.json). When run on a Batch Page the JSON is saved to that page; when run on a Batch Folder the file is saved to the folder level.
  • The Overwrite flag controls whether an existing file entry with the same name is replaced.
  • If JSONSchema is provided it is added to diagnostics as "Schema.json" to aid troubleshooting and reproducibility.

Integration with quoting and LLM-based extraction

  • The output file produced by VLMAnalyze is a plain JSON file stored on the target Batch Object. The existing quoting class JSON File (the JSONFile QuotingMethod) can read that file by setting its Filename to the VLMAnalyze output and inject selected JSON into prompts sent to other LLM operations.
  • Typical pattern:
    1. Run VLMAnalyze to produce a JSON analysis file (per-page or per-folder as appropriate).
    2. Configure a downstream Data Generator or quoting workflow to include the VLMAnalyze JSON by using the JSON File quoting method and pointing Filename to the saved file.
    3. Use the ContentSelector (a JSONPath expression) on the JSON File quoting method to select the exact JSON node(s) to inject into the prompt. The RemovalSelector may be used to strip sensitive or irrelevant nodes before quoting.
    4. SelectedContentKind controls how the selected content is wrapped for the chat message (for example, as system/contextual text).
  • This architecture enables a two-step multimodal extraction flow: first a VLM analyzes visual content and produces a structured JSON representation; second, that JSON is quoted into an LLM prompt to drive schema-driven extraction or higher-level reasoning.

Configuration guidance

  • Provide a JSONSchema that matches the structure you expect from the VLM; doing so improves mapping and validation in downstream steps and is included in diagnostics as Schema.json.
  • Choose Resolution to balance quality and throughput; the activity will resize images to the configured resolution before sending them to the VLM when necessary.
  • Use a stable CustomFileName or rely on the default (VLMAnalyze_{Model}.json) so quoting methods can reliably reference the file.
  • If you want per-page injection for quoting, ensure VLMAnalyze is run on pages (or configured to save per-page files) and set the quoting method's FromPages accordingly.

Diagnostics and artifacts

VLMAnalyze and the embedded Data Generator produce artifacts useful for troubleshooting:

  • The Data Generator writes its normal diagnostics (for example, chat logs and raw LLM response data such as "Chat Log.jsonl" and "Response Data.json") to the extraction diagnostics.
  • VLMAnalyze adds the provided JSONSchema as Schema.json when JSONSchema is non-empty.
  • Include diagnostics when reporting mismatches between the VLM output and downstream extraction results to aid investigation.

Practical notes and caveats

  • The JSON File quoting method expects a JSON file stored on the same Batch Object you quote from; when using folder-level JSON make sure quoting runs in the same folder context (or use FromPages to quote page-level files).
  • Use clear JSONPath expressions in ContentSelector to avoid injecting large irrelevant structures into prompts; prefer selecting only the nodes needed by the downstream extractor.
  • Consider using RemovalSelector to redact or remove private data from the quoted JSON before it is sent to another LLM call.

This class enables robust multimodal workflows by decoupling visual analysis (VLMAnalyze) from conversational or schema-driven LLM extraction steps (Data Generator + quoting). By saving structured JSON on batch objects and using the JSON File quoting method to inject only the necessary nodes, you can build repeatable, auditable, and modular generative extraction pipelines.

Properties

NameTypeDescription
General
Options
Custom File Name String

An optional custom file name for saving analysis data.

File Name String

The name of the file where the analysis data will be saved.

Processing Options

See Also

Used By

Notification