# Text Preprocessing

Text preprocessing is done to remove unwanted characters, words or lines from the document before the extraction or classification is done. Although it is not a necessary step , adding text preprocessors will increase the accuracy of the task. Text preprocessors can be added only in [Regex Extractor](/getting-started/rpa-studio/document-ai/document-extractor/types/regex-extractor.md) and [Keyword Classifier](/getting-started/rpa-studio/document-ai/document-classifier/types/keyword-classifier.md).

{% hint style="info" %}
Preprocessing is applied in the order they appears in the preprocessors list. If there are two preprocessors, then the second preprocessor gets the output of the first preprocessor.&#x20;
{% endhint %}

**Line**

1. **Remove Lines**\
   Removes the specified lines.
   * **Contains**\
     Removes the lines which contains the specified words.
   * **Starts With**\
     Removes the lines which starts with the specified word.
   * **Ends With**\
     Removes the lines which ends with the specified word.
   * **With line numbers**\
     Remove all the lines with the specified line number. For example to remove the First, third and Ninth lines specify the input as 1,9,10.

**Text**

1. **Replace Text**\
   Replaces the text or the text matched by the regular expression with the specified text.
2. **Remove Text**\
   Removes the text or the text matched by the regular expression.
3. **Remove from list**\
   Removes the list of words selected from a text file.

**Character**

1. **Remove Character**\
   Removes all the specified characters from the document text. For example to remove all ‘$’ , ‘5’ and ‘#’ characters, specify the input as ‘$#5’.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.visualyze.ai/getting-started/rpa-studio/document-ai/text-preprocessing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
