External Integrations
External integrations that enable extracting data from files in different formats and cast it into the unified document format.
| Name | Description |
|---|---|
| Azure Document Intelligence | Convert PDF, PPTX, DOCX, HTML, and other document formats into Haystack documents through advanced document analysis including layout detection, table extraction, and structured content recognition. |
| Docling | Parse PDF, DOCX, HTML, and other document formats into a rich standardized representation (such as layout, tables..), which it can then export to Markdown, JSON, and other formats. |
| PaddleOCR | Use PaddleOCR’s text-recognition and document-parsing capabilities. |