Skip to main content
Version: 2.25-unstable

External Integrations

External integrations that enable extracting data from files in different formats and cast it into the unified document format.

NameDescription
Azure Document IntelligenceConvert PDF, PPTX, DOCX, HTML, and other document formats into Haystack documents through advanced document analysis including layout detection, table extraction, and structured content recognition.
DoclingParse PDF, DOCX, HTML, and other document formats into a rich standardized representation (such as layout, tables..), which it can then export to Markdown, JSON, and other formats.
PaddleOCRUse PaddleOCR’s text-recognition and document-parsing capabilities.