Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Text Analysis

Value Extractor Grooper.GPT.Azure

Abstract base for extractors that leverage Azure AI Language Services to analyze document text and extract entities, key phrases, or PII.

Remarks

Text Analysis provides a foundation for Grooper extractors that connect to Azure AI Language Services, enabling advanced natural language processing (NLP) on document content. These extractors are used to identify named entities, key phrases, or personally identifiable information (PII) in unstructured or semi-structured text.

Overview

'Text Analysis' extractors operate by sending document text to Azure's cloud-based NLP engine. The service analyzes the content and returns structured results, such as recognized entities, detected key phrases, or flagged PII. This enables Grooper to enrich extracted data, support compliance workflows, and automate content understanding.

Key Features

  • Cloud-Based NLP:
    Integrates with Azure AI Language for high-accuracy entity recognition, key phrase extraction, and PII detection.
  • Category Filtering:
    Allows you to restrict output to specific entity types using the 'Include Category' property.
  • Result Filtering and Post-Processing:
    Use 'Result Filter' and 'Result Set Options' to refine, deduplicate, and order results.
  • Batch and Synchronous Modes:
    Automatically batches requests to comply with Azure API limits and supports both synchronous and asynchronous operation.

Usage Examples

  • Entity Recognition:
    Extract named entities from document text, such as organizations, locations, or products.
  • PII Detection:
    Identify and redact sensitive information, including names, addresses, or identification numbers.
  • Key Phrase Extraction:
    Summarize document content by extracting key phrases for indexing or search.

Configuration Guidance

  • A Text Analysis Option must be configured on the repository root, providing the Azure resource name and API key.
  • Use 'Include Category' to focus extraction on relevant entity types and reduce noise.
  • Adjust 'Result Filter' and 'Result Set Options' to control which results are included and how they are processed.
  • Test your configuration with representative document samples to ensure desired extraction behavior.

Diagnostics

When diagnostic logging is enabled, 'Text Analysis' may produce artifacts such as:

  • Request and response JSON files for each Azure API call.
  • Timing logs for request and response cycles.
  • Text files containing the analyzed document content.
  • Summaries of extracted entities, key phrases, or PII.

For more information, see Azure AI Language Service Overview.

Properties

NameTypeDescription

Derived Types

There are 3 implementations of Text Analysis.

Entity Recognition Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.
Key Phrase Extraction Identifies key concepts and topics in text using Azure AI Language key phrase extraction.
Pii Entity Recognition Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.

See Also

Used By

Notification