Skill Extractor

Document Extraction using Machine Learning models

Extraction using Skill Extractor is the easiest and most accurate way of document extraction given a suitable skill model is available. Skills are Machine Learning models which can identify and extract key-value data and tables from documents. Document AI provides pretrained skills for Invoice, Receipt, Business Card and ID Documents out of the box. If the pretrained skills are not suitable for the purpose, then user can train their on skills using Document AI. For the skill extractor user only needs to map their required fields with the fields available in the skill.

Creating a Document Extraction Workflow using Skill Extractor

Document Extraction Workflow with Skill Extractor is created in two stages.

  1. Creating a skill extractor definition to specify the field names and its types. A skill is selected and skill fields are mapped with the definition fields.

  2. Creating a Document AI client with Document AI credentials in the Studio Workflow and running the Run Extractor activity with the created definition.

Creating the Extractor Definition.

Creating a Document Extraction workflow begins by creating the Document Extractor Definition file. The definition file contains information about the type of extraction, the fields that need to be extracted and how each field is identified and extracted.

Launching the Create Extractor Window

The Extractor Definition is created by using the Create Extractor Window. The Create Extractor Window can be launched from the Home section in the Robot Studio Ribbon menu.

In the Robot Studio, click Home-> Document AI -> Create Document Extractor.

Creating the Skill Extractor Definition

Users can either load and edit an already existing definition or create a new definition. To create a Skill Extractor

Click on Create -> Skill.

In the Configure Client section set the Document AI Endpoint and API Key, then click apply.

In the Configure Fields section load the available skills by clicking the Refresh button. Select the required skill from the dropdown. Select pretrained invoice model skill for invoice extraction.

Add the required fields by clicking the Add button. Let’s add two fields InvoiceNumber, and InvoiceDate with output format Text and Date. Assign the invoice model fields InvoiceId and InvoiceDate.

Click the Add Table button in the Add New Field dropdown to add a table. Give the name LineItems to the table.

Add the columns by clicking the Add Column button. Let’s add two columns Description, and Quantity with output format Text. Assign the invoice model fields Items-Description and Items-Quantity.

Configuration is not required for skill extractor.

Click Save button to save the definition as a definition file.

Creating the Document Extraction Workflow

Three activities are required for creating the Extraction Workflow. Create Document AI Client, Preprocess Document and Run Extractor. This example also uses Show Validation Window activity to view the extracted data.

Configuring Create Document AI Client Activity

Add Create Document AI Client Activity to the Workflow. Click on the Configure button to launch the Create Document AI Client Window. In the Client Authorization section provide the Document AI Endpoint and API Key. In the Available Services section set the provider as Visualyze. Set Skill as the Extraction Type.

Save the configuration.

Configuring Preprocess Document Activity

Add Preprocess Document activity to the Workflow. Assign the DocAIClient variable from the Create Document AI Client activity to the Document AI Client property. Set a PDF or image file path to the Input File property.

Configuring Run Extractor Activity

Add Run Extractor activity to the Workflow. In the Document AI Client property assign the DocAIClient variable. In the Processed Document property assign the ProcessedDocument variable from Preprocess Document activity. In the Extractor Definition property assign the path to the created Extractor Definition file.

Configuring Show Validation Window Activity

Finally add Show Validation Window and add the ExtractionResult variable from the Run Extractor activity to the Extraction Result property. Run the workflow. The extraction will be applied to the selected file and the results will be displayed on the Validation Window.

Last updated