Grooper Help - Version 25.0
25.0.0017 2,127
  • Overview
  • Help Status

Simple

Section Extract Method Grooper.Extract

Identifies section instances by matching contiguous segments of text using a configured extractor.

Remarks

The Simple section extraction method is designed to extract sections from a document where the content for each section is contiguous and can be identified using pattern matching. This method is ideal when both the start and end of each section can be reliably located using a regular expression or other Value Extractor.

For each match found by the extractor, the character data between the starting and ending positions will become an output section instance.

Example Scenario 1: Employee Records

Suppose you have a document containing multiple employee records, each beginning with EMPLOYEE: and ending with a DEPARTMENT: label. You can configure the Simple method to extract each record using a regular expression pattern.

Sample document:
ACTIVE EMPLOYEE REPORT

EMPLOYEE: Doe, Jane SSN: 000-00-0000 PHONE: (000) 000-0000
ADDRESS: 1234 S Main CITY: Anytown STATE: AA ZIP: 00000
DOB: 00/00/0000 EMAIL: janedoe@acme.com DEPARTMENT: IT

EMPLOYEE: Doe, John SSN: 000-00-0000 PHONE: (000) 000-0000
ADDRESS: 2233 N Elm CITY: Anytown STATE: AA ZIP: 00000
DOB: 00/00/0000 EMAIL: johndoe@acme.com DEPARTMENT: AP

EMPLOYEE: Smith, Kim SSN: 000-00-0000 PHONE: (000) 000-0000
ADDRESS: 8888 W 24th CITY: Anytown STATE: AA ZIP: 00000
DOB: 00/00/0000 EMAIL: kimsmith@acme.com DEPARTMENT: Admin

Sample regular expression:
EMPLOYEE:.*?DEPARTMENT:[^\r]+

This pattern will match each employee record as a single section instance.

Example Scenario 2: Borrower Information Section

Another example is extracting a single section between two headings, such as "BORROWER INFORMATION" and "LENDER INFORMATION":

Sample document:
APPRAISAL REPORT

PROPERTY INFORMATION
PARCEL ID: 37239534 SQ FT: 1,652 BED: 3 BATH: 2.5
ADDRESS: 1234 S Main CITY: Anytown STATE: AA ZIP: 00000
GARAGE: 2 Car POOL: N Central Heat/Air: Y

BORROWER INFORMATION
EMPLOYEE: Doe, John SSN: 000-00-0000 PHONE: (000) 000-0000
ADDRESS: 2233 N Elm CITY: Anytown STATE: AA ZIP: 00000
DOB: 00/00/0000 EMAIL: johndoe@acme.com DEPARTMENT: AP
EMPLOYEE: Doe, Jane SSN: 000-00-0000 PHONE: (000) 000-0000
ADDRESS: 1234 S Main CITY: Anytown STATE: AA ZIP: 00000
DOB: 00/00/0000 EMAIL: janedoe@acme.com DEPARTMENT: IT
LENDER INFORMATION
LENDER NAME: First Bank and Trust, 1 N Main Street, Anytown, AA 00000
...

Sample regular expression:
\r\nBORROWER INFORMATION\r\n.*?\r\nLENDER INFORMATION\r\n

Configuration Guidance

  • Set the 'Extractor' property to a Value Extractor (such as Pattern Match) that matches the full span of each section.
  • Use the 'Output Leaf Groups' property to control whether only the leaf groups (deepest matches) are output as section instances.

This method is best suited for documents with well-defined, contiguous sections and should be avoided for highly variable or fragmented content.

Properties

NameTypeDescription

See Also

Used By

Notification