Skip to main content
Version: 2.28

Extractors

NameDescription
LLMDocumentContentExtractorExtracts textual content from image-based documents using a vision-enabled Large Language Model (LLM).
LLMMetadataExtractorExtracts metadata from documents using a Large Language Model. The metadata is extracted by providing a prompt to a LLM that generates it.
NamedEntityExtractorExtracts predefined entities out of a piece of text and writes them into documents' meta field.
PresidioEntityExtractorDetects PII in Documents and stores entities as structured metadata, without modifying the text. Powered by Microsoft Presidio.
RegexTextExtractorExtracts text from chat messages or strings using a regular expression pattern.