# LLM Extractor

The LLM Extractor is a reliable method for extracting data from documents when a suitable skill model is not available to use the Skill Extractor. It is particularly useful when you cannot make any assumptions about the document structure. The LLM (Large Language Model) utilizes user-defined fields and keywords to extract specific information from documents. By mapping required fields with keywords, the LLM can accurately identify and extract the desired information. Additionally, hint text can be provided to assist the LLM in identifying fields more accurately.

## Creating a Document Extraction Workflow using LLM Extractor

Document Extraction Workflow with LLM Extractor is created in two stages:

1. Creating a LLM extractor definition to specify the field names, keywords, and types.
2. Creating a Document AI client with Document AI credentials in the Studio Workflow and running the Invoke LLM activity with the created definition.

### Creating the Extractor Definition

Creating a Document Extraction workflow begins by creating the Document Extractor Definition file. The definition file contains information about the type of extraction, the fields that need to be extracted, and how each field is identified and extracted.

#### Launching the Create Extractor Window

1. Open Robot Studio.
2. Click on the **Home** tab in the Ribbon menu.
3. Navigate to **Document AI** and select **Create Document Extractor**.\ <br>

   <figure><img src="/files/2YQWxQxfoSMZTefMVTbS" alt=""><figcaption></figcaption></figure>

#### Creating the LLM Extractor Definition

1. In the [**Create Extractor Window**](/getting-started/rpa-studio/document-ai/document-extractor/create-extractor-window.md), click on **Create** and select **LLM**.
2. Add the required fields by clicking the **Add** button. For example, let's add two fields: `InvoiceNumber` and `InvoiceDate` with output formats `Text` and `Date`, respectively.
3. Provide the respective keywords for each field, such as `Invoice Number` and `Invoice Date`.\ <br>

   <figure><img src="/files/4eo4c0gERvwBFj096Xya" alt=""><figcaption></figcaption></figure>
4. To extract tabular data, click the **Add Table** button in the **Add New Field** dropdown and give it a name, e.g., `LineItems`.
5. Add columns to the table by clicking the **Add Column** button. For example, add two columns: `Description` and `Quantity` with output format Text. Assign the keywords `Item Description` and `Item Quantity` to the columns, respectively.\ <br>

   <figure><img src="/files/md7DjJg3D7n1h3YZlzEA" alt=""><figcaption></figcaption></figure>
6. No configuration is required for LLM extractor.
7. Click the **Save** button to save the definition as a definition file.

### Creating the Document Extraction Workflow

To create the Extraction Workflow with LLM, you need four activities: two [**Create Document AI Client**](/rpa-studio/document-ai/create-document-ai-client.md) activities, one [**Preprocess Document**](/rpa-studio/document-ai/tasks/preprocess-document.md) activity, and one [**Invoke LLM**](/rpa-studio/document-ai/text/invoke-llm.md) activity. One document AI client is used for text extraction and the other is used for using the LLM. Additionally, the [**Show Validation Window**](/rpa-studio/document-ai/tasks/show-validation-window.md) activity can be used to view the extracted data.\ <br>

<figure><img src="/files/CDb1DE2TgG5tzyqLKIch" alt=""><figcaption></figcaption></figure>

#### Configuring Create Document AI Client Activity for Text Extraction

1. Add the **Create Document AI Client** activity to the Workflow.
2. Click on the **Configure** button to launch the [**Create Document AI Client Window**](/getting-started/rpa-studio/editor-windows/create-document-ai-client-window.md).\ <br>

   <figure><img src="/files/0uD2jfQVjYvivNktzXVK" alt=""><figcaption></figcaption></figure>
3. In the **Client Authorization** section, provide the Document AI Endpoint and API Key.
4. In the **Available Services** section, set the provider as **Visualyze**.
5. Set **Text** as the Extraction Type.\ <br>

   <figure><img src="/files/BigClbncKsGKyP3ZH7UB" alt=""><figcaption></figcaption></figure>
6. Save the configuration.

#### Configuring Create Document AI Client Activity for LLM

1. Add another **Create Document AI Client** activity to the Workflow.
2. Click on the **Configure** button to launch the **Create Document AI Client Window**.\ <br>

   <figure><img src="/files/vd1AFJN1AQaZa3mUsSJZ" alt=""><figcaption></figcaption></figure>
3. In the **Client Authorization** section, provide the Document AI Endpoint and API Key.
4. In the **Available Services** section, set the provider as **Visualyze**.
5. Set **LLM** as the Extraction Type.\ <br>

   <figure><img src="/files/pFQRCvfM4P6JR6qDfMmG" alt=""><figcaption></figcaption></figure>
6. Save the configuration.

#### Configuring Preprocess Document Activity

1. Add the **Preprocess Document** activity to the Workflow.
2. Assign the **DocAIClient** variable from the Create Document AI Client activity for text extraction to the **Document AI Client** property.
3. Set a PDF or image file path to the **Input File** property. This activity applies OCR on the document.\ <br>

   <figure><img src="/files/EpCOX0QLUHDjPxKNQmSw" alt=""><figcaption></figcaption></figure>

#### Configuring Invoke LLM Activity

1. Add the **Invoke LLM** activity to the Workflow.
2. In the **Document AI Client** property, assign the **DocAIClient** variable from the Document AI Client Activity for LLM.
3. In the **Processed Document** property, assign the **ProcessedDocument** variable from the Preprocess Document Activity.
4. In the **Extractor Definition** property, assign the path to the created LLM Extractor Definition file.\ <br>

   <figure><img src="/files/vL9NmY1xyP9EGtV37jlM" alt=""><figcaption></figcaption></figure>

#### Configuring Show Validation Window Activity

1. Finally, add the **Show Validation Window** activity.
2. Add the **ExtractionResult** variable from the Invoke LLM activity to the **Extraction Result** property.\ <br>

   <figure><img src="/files/aLk6sv2uIo4egKdkKcRZ" alt=""><figcaption></figcaption></figure>
3. Run the workflow.
4. The extraction will be applied to the selected file, and the results will be displayed on the Validation Window.

This is a simple example of creating the extraction workflow. Use this guide as a reference to extract information from documents using the LLM Extractor, and customize it according to your specific requirements.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.visualyze.ai/getting-started/rpa-studio/document-ai/document-extractor/types/llm-extractor.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
