Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Pii Entity Recognition

Text Analysis Grooper.GPT.Azure

Identifies, categorizes, and redacts sensitive information (PII) in unstructured text using Azure AI Language Services.

Remarks

Overview

'PII Entity Recognition' is a value extractor in Grooper that connects to Azure AI Language Services to analyze document text and extract personally identifiable information (PII). It leverages advanced machine learning models to recognize and classify sensitive entities, supporting compliance, privacy, and data protection scenarios.

This extractor is typically used when you need to detect and optionally redact PII within free-form or unstructured text, such as correspondence, forms, or reports.

How It Works

'PII Entity Recognition' sends document text to Azure's PII detection engine. The service returns structured results, identifying and categorizing a wide range of sensitive information types, including phone numbers, email addresses, government IDs, financial data, and more. You can filter which PII types are included in the output using the 'Include Category' property.

Typical Use Cases

  • Redacting Sensitive Data:
    Identify and redact phone numbers, email addresses, or government IDs from documents before sharing or archiving.
  • Compliance Auditing:
    Detect the presence of PII in large document sets for privacy compliance or risk assessment.
  • Automated Privacy Workflows:
    Support automated redaction, flagging, or routing of documents containing PII.
  • Custom PII Filtering:
    Use the 'Include Category' property to focus on only the PII types relevant to your use case.

Supported PII Entity Types

The Azure PII Entity Recognition service detects a wide range of sensitive information. Below are key entity types commonly used:

Entity Type Description
ABARoutingNumber US bank routing number used for financial transactions.
Address Physical or mailing address information.
Age Age values, typically associated with individuals.
AzureDocumentDBAuthKey Azure Cosmos DB authentication key.
AzureIAASDatabaseConnectionAndSQLString Azure IaaS database connection string, including SQL credentials.
AzureIoTConnectionString Azure IoT Hub connection string.
AzureStorageAccountGeneric Generic Azure Storage account identifier.
AzureStorageAccountKey Azure Storage account access key.
CreditCardNumber Credit card numbers (all major brands).
Date Dates, including birthdates and other sensitive date values.
DrugEnforcementAgencyNumber US DEA registration number for controlled substances.
Email Email addresses.
IPAddress IPv4 and IPv6 addresses.
InternationalBankingAccountNumber International bank account number (IBAN).
Organization Organization names.
Person Person names.
PhoneNumber Telephone numbers (international formats supported).
URL Web addresses (URLs).
USBankAccountNumber US bank account numbers.
USDriversLicenseNumber US driver’s license numbers.
USIndividualTaxpayerIdentification US ITIN (taxpayer identification number).
USSocialSecurityNumber US Social Security Number (SSN).
USUKPassportNumber US or UK passport numbers.

Other supported categories include:

  • National Identifiers:
    Identity card numbers, tax IDs, social security numbers, and passport numbers for many countries.
  • Financial Data:
    Bank account numbers, SWIFT codes, debit card numbers, and other financial identifiers.
  • Healthcare Identifiers:
    Health insurance numbers, medical account numbers, and personal health IDs.
  • Cloud & Connection Credentials:
    Azure service keys, connection strings, and authentication credentials.
  • Geolocation & Miscellaneous:
    GPS coordinates, organization names, and other miscellaneous identifiers.

For a complete list of supported entity types, see the contents of PiiEntityTypes.txt.

Configuration Guidance

  • Ensure a Text Analysis Option is configured on the repository root with a valid Azure resource name and API key.
  • Use the 'Include Category' property to restrict output to specific PII types (e.g., phone numbers, government IDs, email addresses).
  • Adjust result filtering and post-processing options to refine and deduplicate results as needed.
  • Test your configuration with representative document samples to ensure desired extraction and redaction behavior.

Diagnostics

When diagnostic logging is enabled, 'PII Entity Recognition' may generate artifacts such as:

  • Request and response JSON files for each Azure API call.
  • Timing logs for request and response cycles.
  • Text files containing the analyzed document content.
  • Summaries of extracted PII entities and their categories.

Notes

  • Requires a configured Text Analysis Option on the repository root with a valid Azure resource name and API key.

Properties

NameTypeDescription

See Also

Used By

Notification