Document Extractor

Extracts data from documents

Document Extractor is a tool for extracting required information from image or PDF documents in a structured form. The output of document extraction is called extraction result. The extraction result contains key value pairs, tables and text from the document. Only the data specified by the user is extracted by the Document Extractor. For this purpose an Extractor Definition must be created by the user.

The Extractor Definition is created in the Create Extractor Window and further configured in Configure Extractor Window. There are three types of Extractors

The Extractor Definition goes to the Extraction Workflow through the Run Extractor Activity. The activity creates the Extraction Result object which contains all the fields defined in the definition and their extracted values. Extraction Result object is a Dictionary where the user defined fields are the keys and the extracted value for each field is the value.

The data in the Extraction Result can be validated by the user before actual usage. This can be done in three ways

  1. Using Show Validation Window Activity where the user validates the data after each extraction.

  2. Creating a Task and assigning to one or more users and waiting until the user validates the Task.

  3. Creating a Task without waiting. Then running another bot or Workflow which will use the extracted data after the Task is validated by the users.

Creating a Document Extraction Workflow

The Document Extraction is done in two steps

  1. Creating an Extractor definition to specify the field names and its types. This also configures how data for each field is extracted.

  2. Creating a Document AI client using your credentials in the Workflow and running the Run Extractor activity.

Follow the links below for a complete example of creating Document Extraction workflow

Last updated