Visualyze Documentation
Visualyze Documentation
Visualyze Documentation
  • Welcome to Documentation
  • Requirements
  • Tutorials
    • Sample Robots
      • Email Reminder Bot
      • Automate: Password Generator
      • Robot For Periodic Covid19 Status Reports
      • Remove Blank Rows From Excel File
      • Translate Any Language to English
      • Gmail Attachment Sort Bot
      • Data Table Extraction (Pdf to Excel)
  • Courses & Certifications
    • Automation Courses & Guides
    • Level 1 - Beginner
  • Getting Started
    • Robot Studio
      • User Interface
        • Workflow Designer
      • Robots
      • Workflows
      • Activities
      • Document AI
        • Document Classifier
          • Create Classifier Window
          • Configure Classifier Window
          • Types
            • Keyword Classifier
        • Document Extractor
          • Create Extractor Window
          • Configure Extractor Window
          • Types
            • Regex Extractor
            • Form Extractor
            • Skill Extractor
            • LLM Extractor
          • Extractor Constraints
        • Document Viewers
          • Image Document Viewer
        • Text Preprocessing
        • Document Validation
          • Validation Window
      • Variables
        • User-defined variables
        • Activity variables
        • Objects
      • Arguments
      • Input & Output
      • Handling Files
      • Data Manipulation & Conversion
      • Editor Windows
        • String Modifications Window
        • PDF Viewer Window
        • XPath Helper
        • JSONPath Helper
        • Web Request Builder
        • Desktop Element Selector Editor Window
        • Web Element Selector Editor Window
        • Configure Scrape Data Window
        • Filter DataTable Window
        • Filter Data Tables Window
        • Build Data Table Window
        • Add Data Row Window
        • Html Editor Window
        • Create Document AI Client Window
        • Configure Task Data Window
      • Record & Execute
        • Desktop Recorder
          • Desktop Actions
          • Desktop Element Property
          • Desktop Element Conditions
        • Web Recorder
          • Web Actions
          • Web Properties
          • Web Element Conditions
      • Publish a Robot
      • Create Bot - Hello Human!
      • Debugging
    • Visualyze Assistant
      • Human Tasks
        • Introduction
        • Tasks Listing
        • Validation Task Window
    • Robot Cloud
      • Tenant & Workspaces
      • Robots
      • Hosts
      • Executions
      • Schedule
      • Storage
    • Document AI
    • Robot AI
    • Human Tasks
      • Creating a Robot with Human Tasks
      • Task Assignments
      • Completing Tasks
      • Managing Tasks
      • Advanced Topics
      • Troubleshooting
  • Activities
    • Overview
    • Workflow
      • Control
        • Invoke Method
        • Condition
        • Start
        • Assign
        • Assign Multiple
      • Loop
        • Repeated Loop
        • For Each
        • Loop End
        • Examples
      • Run Workflow
      • Write Log
    • DateTime
      • Subtract Date Time
      • Get DateTime
      • Convert DateTime
      • Add DateTime
      • DateTime Formats
        • Standard date and time format
        • Custom date and time format strings
    • Local System
      • Wait For Text On Screen
      • Wait For Hot Key
      • Show Desktop
      • Suspend Service
      • Wait For Service
      • Wait For Process
      • Resume Service
      • Stop Service
      • Start Service
      • Set Default Printer
      • Empty Recycle Bin
      • Run Command
      • Run VBScript
      • Power
      • Play Sound
      • Ping
      • Invoke Python Script
      • Get System Info
      • Get Display Info
      • Get Default Printer
    • Network
      • Web
        • Send Notification
        • Wait For Notification
        • Send HTTP Request
      • SSH
        • SSH Run Command
        • SSH Disconnect
        • SSH Connect Scope
      • FTP
        • Synchronize FTP Folder
        • Move FTP Files
        • Upload Folder
        • Upload File
        • Rename File
        • List FTP Folder
        • Invoke FTP Command
        • Folder Exists
        • File Exists
        • Enumerate Items
        • Download Files
        • Download Multiple Files
        • Download Multiple Files
        • Delete FTP Folder
        • Delete FTP File
        • Create FTP Folder
        • Close FTP Connection
        • Open FTP Connection
    • File
      • Common
        • List Folders
        • Rename File
        • Increment
        • Decrement
        • Print File
        • File Exists
        • Path Exists
        • Read Lines
        • Folder Exists
        • Read File Metadata
        • Wait For Files
        • Compress File/Folder
        • Write File
        • Decompress File
        • Read Text File
        • Move Folder
        • Move File
        • List files
        • Delete Folder
        • Delete File
        • Encode Text
        • Decode Text
        • Create Temp Folder
        • Create Temp File
        • Create Folder
        • Copy Folder
        • Copy File
        • Append Line
        • Sanitize Filename
      • XML
        • Set Element Value
        • Set Element Attribute
        • Insert Element Value
        • Get HTML Element
        • Get Element Value
        • Get Element Attribute
        • Evaluate XPath
        • Delete Element
        • Delete Attribute
      • PDF
        • Print PDF File
        • Append Image To PDF
        • Digitally Sign PDF
        • Validate PDF Digital Sign
        • Extract Image From PDF
        • Extract PDF Page
        • Extract Table From PDF
        • Extract Text From PDF
        • Merge PDF Files
        • Read PDF Elements
      • JSON
        • Filter JSON Array
        • Filter JSON Keys
        • Get Entry By Index
        • List JSON Properties
        • Json Array Entry Count
        • Data table to JSON Array
        • JSON Array to Data table
        • Set Key Value
        • Evaluate JSON Path
        • Delete Key Value
        • Create JSON
      • CSV
        • Write Csv File
        • Read Csv File
    • Collection
      • Find Item In Collection
    • Data
      • List
        • Create List
        • Add To List
        • Apply List Operation
        • Filter List
        • List to File
      • Dictionary
        • Add To Dictionary
        • Lookup Dictionary
        • Create Dictionary
      • String
        • Modify Text
        • Calculate Text Similarity
        • Replace Character
        • Find and Replace
        • Remove Character
        • Format Text
        • Trim Text
        • Split Text
        • Replace Text
        • Pad Text
        • Join Text
        • Get Text Length
        • Get Sub Text
        • Generate Random Text
        • Convert To Text
        • Compare Text
        • Change TextCase
        • Apply Regex
        • Append a Line
      • Datatable
        • DataRow
          • Get Data Row Values
          • Remove Data Row
          • Add Data Row
          • Find Row
          • Apply Row Filter
          • Remove Duplicate Rows
        • DataColumn
          • Rename Data Column
          • Remove Data Column
          • Add Data Column
          • Insert Data Column
          • Get Data Column
        • Apply DataTable Operations
        • Filter Data Tables
        • String To Data Table
        • Append Data Table
        • Extract Row/Column
        • Sort Data Table
        • Merge Data Table
        • LookUp Data Table
        • DataTable To String
        • Clear Data Table
        • Build Data Table
        • Join Data Table
      • Database
        • Execute Non Query
        • Execute Query
        • Close Connection
        • Open Connection
      • Image
        • Barcode
          • Encode String
          • Decode Barcode
          • Decode Multiple Barcode
        • Take Screenshot
        • Image To File
        • QR to Image
        • Image to QR
      • Cryptography
        • Hash File
        • Hash Text
        • Decrypt Text
        • Decrypt File
        • Encrypt Text
        • Encrypt File
    • Application
      • PowerShell
        • Invoke PowerShell
      • Mail
        • Outlook
          • Archive Outlook Mail
          • Replyto Outlook Mail
          • Save Outlook Attachment
          • Save Outlook Mail
          • Delete Outlook Mail
          • Move Outlook Mail
          • Get Outlook Contact List
          • Send Outlook Mail
          • Get Outlook Mail
        • Save Mail
        • Save Attachment
        • Get POP3 Mail
        • Send SMTP Mail
        • Move IMAP Mail
        • Get IMAP Mail
    • System
      • Kill Process
      • Start Process
      • ClipBoard
        • Set To Clipboard
        • Get From Clipboard
      • Launch Application
    • Environment
      • Play Beep
      • Get Environment Variable
      • Get Environment Folder
      • Get Credential
      • Request Credentials
      • Set Credential
    • Microsoft Office
      • Excel
        • Cell
          • Write Cell Value
          • Set Cell
          • Replace Cell
          • Read Cell Value
          • Read Cell Formula
          • Get Cell Color
          • Find Cell
          • Get Cell
        • Sheet
          • Copy Paste Sheet
          • Get Workbook Sheets
          • Get Workbook Sheet
          • Rename Sheet
          • Delete Sheet
          • Create Sheet
          • Read Sheet
        • Rows
          • Find Row
          • Read Row
          • Insert Row
        • Range
          • Apply Style
          • Write Range
          • Read Range
          • Fill Data Range
          • Delete Cell/Range
          • Copy Paste Range
          • Excel Range To Image
        • Columns
          • Read Column
          • Insert Column
        • Open Excel File
        • Close Excel File
        • Delete Row Column
        • Save Excel File
        • Find Duplicates
        • Excel To HTML
      • Word
        • Open Word Document
        • Close Word Document
        • Insert DataTable
        • Insert Image
        • Read Text
        • Write Text
        • Replace Text
        • Save Word Document
        • Append Text
        • Get Element
        • Get DataTables
        • Set Bookmark Content
    • UI Automation
      • Vision
        • Wait For Text
        • Image Exists
        • Indicate On Screen
        • Run OCR on Image
        • Run OCR on Screen
        • Check Text Exists
        • Find Image Position
        • Get Text Position
        • Get Element From Position
        • Hover Image
        • Hover Text
        • Click Text
        • Wait For Image
        • Click Image
      • Forms
        • Select File
        • Select Folder
        • Show Confirmation
        • Show Desktop Alert
        • Show Form
        • Show Input Message Box
        • Show Message
      • Web Browser
        • Take WebPage Screenshot
        • Goto URL
        • Wait
        • Get Table
        • Get Attribute
        • Select
        • Get Text
        • Find Elements
        • Goto URL
        • Set Text
        • Open Web Browser
        • Close Web Browser
        • Click
        • Set Attribute
        • Set Property
        • Scrape Data
        • Switch To
        • Web Automation Session Activity
        • Send Keys
      • SAP
        • Row
          • Set Scroll Position
          • Get All Rows
        • Connection
          • Get Active Session
          • Create Connection
          • End Transaction
          • Start Transaction
        • Action
          • Select Radio Button
          • Set CheckBox
          • Send Virtual Key
          • Set Focus
          • Select Menu Item
          • Click ComboBox
          • Get Text
          • Read Statusbar Info
          • Select ComboBox Item
          • Click Button
        • Column
          • Select Column
        • Find
          • Wait For Element
          • Find By Id
          • Find By Name
      • Desktop
        • Press Key
        • Type
        • Click
        • Focus Window
        • Change Window State
        • Change Window Position
      • CMD
        • Wait For Text
        • Write to Cmd Session
        • Read from Cmd Session
        • Close Cmd Session
        • Open Cmd Session
    • Document AI
      • Create Document AI Client
      • Text
        • Extract Text
        • Extract Entity
        • Invoke LLM
      • Tasks
        • Classify Document
        • Run Extractor
        • Preprocess Document
        • Show Validation Window
        • Update Extracted Field
    • Robot Cloud
      • Analytics
        • Update DataSet
        • Create DataSet
      • DataStore
        • Get Variable
        • Update Variable
        • Get Credential
        • Run Document Query
        • Update Document
        • Get Document
      • Human Task
        • Get Tasks
        • Remarks
          • Add Remark
        • Show Tasks
        • Update Task Status
        • Create Task
    • SAP
      • BAPI
        • Create SAP BAPI Connection
        • Invoke SAP BAPI
  • Integrations
    • GSuite
      • Setup
      • Activities
        • Create GSuite Connection Activity
        • Get Google Token
        • Drive
          • Upload File
          • Move File
          • Get File Info
          • Find Files And Folders
          • Download File
          • Delete File
          • Create New Spreadsheet
          • Create Folder
          • Create Document
          • Copy File
        • Sheets
          • Delete Filter
          • Find And Replace Cells
          • Column
            • Read Column
            • Add/Delete Columns
            • Insert Column
            • Hide/Unhide Columns
          • Row
            • Append Row
            • ReadRow
            • Add/Delete Rows
            • Insert Row
            • Hide/Unhide Row
          • Sheet
            • Download Spreadsheet
            • Get Sheet Info
            • Read Sheet
            • Hide Sheet
            • Unhide Sheet
            • Get Sheets
            • Copy Sheet
            • Rename Sheet
            • Delete Sheet
            • Add New Sheet
          • Range
            • Auto Fill Range
            • Copy Paste Range
            • Merge Range
            • UnMerge Range
            • Delete Range
            • Write Range
            • Clear Range
            • Read Range
          • Cell
            • Write Cell
            • Read Cell
    • Office 365
      • Setup
      • Activities
        • Excel
          • Create WorkBook
          • Add Sheet
          • Get Sheets
          • Read Cell
          • Write Cell
          • Read Column
          • Read Row
          • Read Range
          • Write Range
          • Append Range
          • Delete Range
          • Clear Range
          • Rename Sheet
          • Copy Range
          • Copy Sheet
          • Create Table
          • Delete Column
          • Delete Rows
          • Insert Column
          • Insert Rows
          • Get Table Range
          • VLookup Range
          • Get Cell Color
          • Set Range Color
        • Get Microsoft Token
        • Create Office365 Connection
        • Outlook
          • Get Mail
          • Get Contacts
          • Delete Mail
          • Send Mail
          • Move Mail
          • Forward Mail
          • Set Mail Categories
        • Files
          • Copy Item
          • Create Folder
          • Delete Item
          • Download File
          • Export As PDF
          • End Drive Loop
          • Find Files And Folders
          • Foreach File/Folder
          • Get Item Info
          • Move Item
          • Share Item
          • Upload File
        • Calendar
          • Add Attachment
          • Add Attendee
          • Add Location
          • Create Event
          • Delete Event
          • Find Meeting Times
          • Get Calendars
          • Modify Event
          • RSVP
          • Search Events
    • Azure
      • Run Form Recognizer
Powered by GitBook
On this page
  • Creating a Document Extraction Workflow using Regex Extractor
  • Creating the Extractor Definition.
  • Creating the Workflow
  1. Getting Started
  2. Robot Studio
  3. Document AI
  4. Document Extractor
  5. Types

Regex Extractor

Document Extraction using Regular Expressions

PreviousTypesNextForm Extractor

Last updated 2 years ago

Regex Extractor use Regular Expressions, aka regex to identify and extract the required fields from documents. It is useful in the case where a suitable skill model is available and the required field is not available. But it necessitates the user to be familiar with regular expressions. Regex Extractor uses PCRE Regex Engine. The user is free to choose how they model the regex for extracting a field. But the common practice is to match the pattern of value or of both the key and value, then take only the value part.

Creating a Document Extraction Workflow using Regex Extractor

Document Extraction Workflow with Regex Extractor is created in two steps.

  1. Creating a regex extractor definition to specify the field names and its types. Here the regular expressions to identify and extract each field is configured.

  2. Creating a Document AI client with Document AI credentials in the Studio Workflow and running the activity.

The following example explains creation of a document extraction workflow using a Regex Extractor.

Creating the Extractor Definition.

Launching the Create Classifier Window

Creating a Document Extraction workflow begins by creating the Document Extractor Definition file. The definition file contains information about the type of extraction, the fields that need to be extracted and how each field is identified and extracted.

The Extractor Definition is created by using the . To launch the Create Extractor Window. The Create Extractor Window is launched from Add Item Window. Click on Add Item button in the project pane.

In the Add Item Window, click AI Skills-> Document AI -> New Document Extractor.

Creating the definition

Users can either load and edit an already existing definition or create a new definition. For creating the definition user should select an Extractor Type from the available types.

Click on Create -> Regex.

In the Configure Client section set the Document AI Endpoint and API Key, then click apply.

In the Configure Fields section add new fields by clicking the add button. Let’s add two fields Total and Date with output format Number and Date. The field definition differs for each type of extractor.

Configuring the Sample Files

In the Files section, click on the Browse button to select the folder containing sample files.

Preprocessing the document

Check the Enable Text Preprocessing checkbox to include text preprocessing for the document.

In the Text Preprocessing section, click on Apply OCR. This will create a Text View Tab which contains the OCR text in the selected document.

Add preprocessors to remove or replace unwanted lines, text or characters from the document. The extraction will be applied on the preprocessed document. In this example we will add a remove lines preprocessor.

Applying the preprocessing will create a Preprocessed Tab with the text after preprocessing is applied.

Configuring the Extractor

In the Extractors section, add the regex to identify and extract the fields. Select Ignore case regex option to Date field to make the matching case-insensitive. Click Apply to test the regex on the current document. Click Apply All to apply the extraction on all documents.

The Results section will show the result of the extraction.

Click Save Changes to save the definition in a definition file.

Creating the Workflow

Configuring Create Document AI Client activity

Save the configuration.

Configuring Preprocess Document activity

Configuring Run Extractor activity

Configuring Show Validation Window Activity

Launch the by clicking on the Launch Extractor Trainer button.

Click on the Add dropdown button in the section and select Line -> Remove Lines. This will add a Remove Lines preprocessor to the preprocessors list. Select With line numbers option. This will remove the line with line number 1. Make sure the checkbox is checked to select it for preprocessing. Then click Apply button. This will apply the preprocessing on the currently selected document with all selected preprocessors.

Three activities are required for creating the Extraction Workflow. , and . This example also uses activity to view the extracted data.

Add Activity to the Workflow. Click on the Configure button to launch the . In the Client Authorization section provide the Document AI Endpoint and API Key. In the Available Services section set the provider as Visualyze. The Extraction Type should be selected according to the created extractor definition. Since the created extractor is a Regex Extractor, it needs text extraction. Set the Extraction Type as Text.

Add activity to the Workflow. Assign the DocAIClient variable from the Create Document AI Client activity to the Document AI Client property. Set a PDF or image file path to the Input File property.

Add activity to the Workflow. In the Document AI Client property assign the DocAIClient variable. In the Processed Document property assign the variable from Preprocess Document activity. In the Extractor Definition property assign the path to the created Extractor Definition file.

Finally add and add the variable from the Run Extractor activity to the Extraction Result property. Run the workflow. The extraction will be applied to the selected file and the results will be displayed on the .

Configure Extractor Window
Text Preprocessing
Create Document AI Client
Preprocess Document
Run Extractor
Show Validation Window
Create Document AI Client
Create Document AI Client Window
Preprocess Document
Run Extractor
Create Extractor Window
Run Extractor
Show Validation Window
Validation Window
ProcessedDocument
ExtractionResult