Skill Extractor
Document Extraction using Machine Learning models
Last updated
Document Extraction using Machine Learning models
Last updated
Extraction using Skill Extractor is the easiest and most accurate way of document extraction given a suitable skill model is available. Skills are Machine Learning models which can identify and extract key-value data and tables from documents. Document AI provides pretrained skills for Invoice, Receipt, Business Card and ID Documents out of the box. If the pretrained skills are not suitable for the purpose, then user can train their on skills using Document AI. For the skill extractor user only needs to map their required fields with the fields available in the skill.
Document Extraction Workflow with Skill Extractor is created in two stages.
Creating a skill extractor definition to specify the field names and its types. A skill is selected and skill fields are mapped with the definition fields.
Creating a Document AI client with Document AI credentials in the Studio Workflow and running the Run Extractor activity with the created definition.
Creating a Document Extraction workflow begins by creating the Document Extractor Definition file. The definition file contains information about the type of extraction, the fields that need to be extracted and how each field is identified and extracted.
The Extractor Definition is created by using the Create Extractor Window. The Create Extractor Window can be launched from the Home section in the Robot Studio Ribbon menu.
In the Robot Studio, click Home
-> Document AI
-> Create Document Extractor
.
Users can either load and edit an already existing definition or create a new definition. To create a Skill Extractor
Click on Create
-> Skill
.
In the Configure Client
section set the Document AI Endpoint and API Key, then click apply.
In the Configure Fields
section load the available skills by clicking the Refresh button. Select the required skill from the dropdown. Select pretrained invoice model
skill for invoice extraction.
Add the required fields by clicking the Add
button. Let’s add two fields InvoiceNumber
, and InvoiceDate
with output format Text
and Date
. Assign the invoice model fields InvoiceId
and InvoiceDate
.
Click the Add Table
button in the Add New Field dropdown to add a table. Give the name LineItems
to the table.
Add the columns by clicking the Add Column
button. Let’s add two columns Description
, and Quantity
with output format Text
. Assign the invoice model fields Items-Description
and Items-Quantity
.
Configuration is not required for skill extractor.
Click Save
button to save the definition as a definition file.
Three activities are required for creating the Extraction Workflow. Create Document AI Client, Preprocess Document and Run Extractor. This example also uses Show Validation Window activity to view the extracted data.
Add Create Document AI Client Activity
to the Workflow. Click on the Configure button to launch the Create Document AI Client Window
. In the Client Authorization
section provide the Document AI Endpoint and API Key. In the Available Services
section set the provider as Visualyze
. Set Skill
as the Extraction Type.
Save the configuration.
Add Preprocess Document
activity to the Workflow. Assign the DocAIClient
variable from the Create Document AI Client
activity to the Document AI Client
property. Set a PDF or image file path to the Input File
property.
Add Run Extractor
activity to the Workflow. In the Document AI Client
property assign the DocAIClient
variable. In the Processed Document
property assign the ProcessedDocument
variable from Preprocess Document
activity. In the Extractor Definition
property assign the path to the created Extractor Definition file.
Finally add Show Validation Window
and add the ExtractionResult
variable from the Run Extractor
activity to the Extraction Result
property. Run the workflow. The extraction will be applied to the selected file and the results will be displayed on the Validation Window
.