Configure Extractor Window

Configures a Extractor Definition

How to Launch

Can be launched only from Create Extractor Window.

User Interface

Toolbar

  • Browse Allows user to load an Extractor Definition File.

  • Reset Discards the changes in current definition from last save.

  • Save Changes Saves the changes to current definition file

Files

Allows the user to select sample files for the configuration. The selected sample file will be the Current Document and it will be displayed in the document viewer.

  • Browse Button Allows user to select the sample folder containing the sample files.

  • OCR All Button Applies OCR on all sample documents

Document Viewer

Displays the Current Document. If the current document is a PDF document or Image document , it will be displayed in Image Document Viewer. If the document has OCR applied, then the OCR text will be displayed in a Text Document Viewer in another tab. If the document is a text document, it will be displayed in Text Document Viewer.

Extractor Fields Configurator

Allows the user to configure the extractor definition.

Toolbar

  • Apply Button Extracts the Current Document using the definition.

  • Apply All Button Extracts all the documents using the definition.

Extraction results are shown in the Extraction Results Pane.

The configuration UI depends on the type of extractor used.

Regex Extractor

Uses Regular Expressions for identifying the data to extract. It uses PCRE as the regex engine. Users can apply following Regex options :

  • IgnoreCase

  • SingleLine

  • MultiLine

  • Unicode

  • Global

  • Sticky

For more information about Regular Expressions and to learn, reference, and test visit the following websites :

Skill Extractor

Configuration is not available for Skill Extractor

Form Extractor

Uses keywords to match the key from identified key-value pairs in the document. Then extracts the data using the matched key.

  • Add Button Adds a new Keyword.

  • Included Toggle Sets whether to Include/Exclude the keyword in matching.

  • Delete Button Deletes the Keyword

  • Edit Distance The maximum Levenshtein Edit Distance allowed between a word in the key and the provided keyword to consider it for match. If the value is 0 then match should be exact.

  • Keyword Options

    Following options are available for modifying the matching :

    • Match Any Only one one of the words are needed to match.

    • Match Case Case sensitive search while matching.

Extraction Results

Shows the extraction results for the Current Document or for all the documents.

  • Document Name of the document

  • Field (many) Lists the extracted value of the field.

How to configure the Extractor

Regex Extractor

Regex Extractor uses Regular Expression to extract the data. It simply applies the regex provided in the field on the document and takes the match of the regex as the extracted value.

Regex Extractor takes only the first match as the extracted value. But if Global option is given on the field, then the value of the field is set by taking all the matches but separated by comma.

see Regex Extractor for more detailed overview

Skill Extractor

Skill extractor applies the document model on the document. Then simply sets the value on the Field according to the Skill Field mapped in the definition.

see Skill Extractor for more detailed overview

Form Extractor

Form Extractor works by finding all key-value pairs in the document. Then for each field the extractor tries to match the field with the keys in the document. A field is considered as match with a key if :

  • All Included keywords are matched

  • All Excluded keywords are not matched

To match a field with key Date but not Date of Birth include the keyword date and exclude the keyword birth.

Keys in the document are matched in the order they appears in the document. Once a key is matched with a field, the matching stops for the field. All other fields are ignored for the field. The extractor takes the value of the matched key, applies post processing and sets it as the value of field. No value is set if no key is matched.

see Form Extractor for more detailed overview

Last updated