Configure Extractor Window
Configures a Extractor Definition
Last updated
Configures a Extractor Definition
Last updated
Can be launched only from Create Extractor Window.
Browse Allows user to load an Extractor Definition File.
Reset Discards the changes in current definition from last save.
Save Changes Saves the changes to current definition file
Allows the user to select sample files for the configuration. The selected sample file will be the Current Document and it will be displayed in the document viewer.
Browse Button Allows user to select the sample folder containing the sample files.
OCR All Button Applies OCR on all sample documents
Displays the Current Document. If the current document is a PDF document or Image document , it will be displayed in Image Document Viewer
. If the document has OCR applied, then the OCR text will be displayed in a Text Document Viewer in another tab. If the document is a text document, it will be displayed in Text Document Viewer
.
Allows the user to configure the extractor definition.
Apply Button Extracts the Current Document using the definition.
Apply All Button Extracts all the documents using the definition.
Extraction results are shown in the Extraction Results Pane.
The configuration UI depends on the type of extractor used.
Uses Regular Expressions for identifying the data to extract. It uses PCRE as the regex engine. Users can apply following Regex options :
IgnoreCase
SingleLine
MultiLine
Unicode
Global
Sticky
Configuration is not available for Skill Extractor
Uses keywords to match the key from identified key-value pairs in the document. Then extracts the data using the matched key.
Add Button Adds a new Keyword.
Included Toggle Sets whether to Include/Exclude the keyword in matching.
Delete Button Deletes the Keyword
Edit Distance
The maximum Levenshtein Edit Distance
allowed between a word in the key and the provided keyword to consider it for match. If the value is 0 then match should be exact.
Keyword Options
Following options are available for modifying the matching :
Match Any
Only one one of the words are needed to match.
Match Case
Case sensitive search while matching.
Shows the extraction results for the Current Document or for all the documents.
Document Name of the document
Field (many) Lists the extracted value of the field.
Regex Extractor uses Regular Expression to extract the data. It simply applies the regex provided in the field on the document and takes the match of the regex as the extracted value.
Regex Extractor takes only the first match as the extracted value. But if Global option is given on the field, then the value of the field is set by taking all the matches but separated by comma.
see Regex Extractor for more detailed overview
Skill extractor applies the document model on the document. Then simply sets the value on the Field according to the Skill Field mapped in the definition.
see Skill Extractor for more detailed overview
Form Extractor works by finding all key-value pairs in the document. Then for each field the extractor tries to match the field with the keys in the document. A field is considered as match with a key if :
All Included keywords are matched
All Excluded keywords are not matched
To match a field with key Date but not Date of Birth include the keyword date and exclude the keyword birth.
Keys in the document are matched in the order they appears in the document. Once a key is matched with a field, the matching stops for the field. All other fields are ignored for the field. The extractor takes the value of the matched key, applies post processing and sets it as the value of field. No value is set if no key is matched.
see Form Extractor for more detailed overview