Version: 2.28

Extractors

Name	Description
LLMDocumentContentExtractor	Extracts textual content from image-based documents using a vision-enabled Large Language Model (LLM).
LLMMetadataExtractor	Extracts metadata from documents using a Large Language Model. The metadata is extracted by providing a prompt to a LLM that generates it.
NamedEntityExtractor	Extracts predefined entities out of a piece of text and writes them into documents' meta field.
PresidioEntityExtractor	Detects PII in Documents and stores entities as structured metadata, without modifying the text. Powered by Microsoft Presidio.
RegexTextExtractor	Extracts text from chat messages or strings using a regular expression pattern.