Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Entity Recognition

Text Analysis Grooper.GPT.Azure

Identifies and categorizes entities such as people, organizations, locations, and quantities in unstructured text.

Remarks

Overview

'Entity Recognition' is a value extractor in Grooper that connects to Azure AI Language Services to analyze document text and extract named entities. It uses advanced machine learning models to recognize and classify entities, making it ideal for document enrichment, compliance, and data mining scenarios.

This extractor is typically used when you need to identify structured information—such as names, locations, or organizations—within free-form or unstructured text.

How It Works

'Entity Recognition' sends document text to Azure's Named Entity Recognition (NER) engine. The service returns structured results, identifying and categorizing entities found in the text. You can filter which entity types are included in the output using the 'Include Category' property.

Typical Use Cases

  • Extracting person and organization names for indexing or compliance.
  • Identifying geographic locations in contracts, correspondence, or reports.
  • Mining documents for quantities, dates, or other structured data.

Supported Entity Types

The following entity types can be recognized (use the 'Include Category' property to filter as needed):

  • Address: Physical or mailing address.
  • Age: Age values (e.g., "35 years old").
  • Airport: Airport names or codes.
  • Area: Measurements of area (e.g., "500 sq ft").
  • City: City names.
  • ComputingProduct: Technology or computing product names.
  • Continent: Names of continents.
  • CountryRegion: Country or region names.
  • CulturalEvent: Named cultural events.
  • Currency: Currency names or symbols.
  • Date: Specific dates.
  • DateRange: Ranges of dates.
  • DateTime: Date and time values.
  • DateTimeRange: Ranges of date and time.
  • Dimension: Dimensional measurements.
  • Duration: Time durations (e.g., "2 hours").
  • Email: Email addresses.
  • Event: Named events.
  • GPE: Geo-political entities (e.g., countries, cities).
  • Geological: Geological terms or entities.
  • Height: Height measurements.
  • IP: IP addresses.
  • Information: General information entities.
  • Length: Length measurements.
  • Location: General locations.
  • NaturalEvent: Natural events (e.g., "earthquake").
  • Number: Numeric values.
  • NumberRange: Ranges of numbers.
  • Numeric: Numeric entities.
  • Ordinal: Ordinal numbers (e.g., "first", "2nd").
  • Organization: Organization names.
  • OrganizationMedical: Medical organizations.
  • OrganizationSports: Sports organizations.
  • OrganizationStockExchange: Stock exchanges.
  • Percentage: Percentage values.
  • Person: Person names.
  • PersonType: Types of persons (e.g., "doctor").
  • PhoneNumber: Telephone numbers.
  • Product: Product names.
  • SetTemporal: Temporal sets.
  • Skill: Skills or competencies.
  • Speed: Speed measurements.
  • SportsEvent: Sports events.
  • State: State or province names.
  • Structural: Structural entities.
  • Temperature: Temperature values.
  • Temporal: Temporal expressions.
  • Time: Time values.
  • TimeRange: Ranges of time.
  • URL: Web URLs.
  • Volume: Volume measurements.
  • Weight: Weight measurements.

Configuration Guidance

  • Ensure a Text Analysis Option is configured on the repository root with a valid Azure resource name and API key.
  • Use the 'Include Category' property to restrict output to specific entity types (e.g., Person, Organization, Location).
  • Adjust result filtering and post-processing options to refine and deduplicate results as needed.
  • Test your configuration with representative document samples to ensure desired extraction behavior.

Diagnostics

When diagnostic logging is enabled, 'Entity Recognition' may generate artifacts such as:

  • Request and response JSON files for each Azure API call.
  • Timing logs for request and response cycles.
  • Text files containing the analyzed document content.
  • Summaries of extracted entities and their categories.

Notes

  • Requires a configured Text Analysis Option on the repository root with a valid Azure resource name and API key.

Properties

NameTypeDescription

See Also

Used By

Notification