Converters
Use various Converters to extract data from files in different formats and cast it into the unified document format. There are several converters available for converting PDFs, images, DOCX files, and more.
Converter | Description |
---|---|
AzureOCRDocumentConverter | Converts PDF (both searchable and image-only), JPEG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, and HTML to documents. |
CSVToDocument | Converts CSV files to documents. |
DocumentToImageContent | Extracts visual data from image or PDF file-based documents and converts them into ImageContent objects. |
DOCXToDocument | Convert DOCX files to documents. |
HTMLToDocument | Converts HTML files to documents. |
ImageFileToDocument | Converts image file references into empty Document objects with associated metadata. |
ImageFileToImageContent | Reads local image files and converts them into ImageContent objects. |
JSONConverter | Converts JSON files to text documents. |
MarkdownToDocument | Converts markdown files to documents. |
MSGToDocument | Converts Microsoft Outlook .msg files to documents. |
MultiFileConverter | Converts CSV, DOCX, HTML, JSON, MD, PPTX, PDF, TXT, and XSLX files to documents. |
OpenAPIServiceToFunctions | Transforms OpenAPI service specifications into a format compatible with OpenAI's function calling mechanism. |
OutputAdapter | Helps the output of one component fit into the input of another. |
PDFMinerToDocument | Converts complex PDF files to documents using pdfminer arguments. |
PDFToImageContent | Reads local PDF files and converts them into ImageContent objects. |
PPTXToDocument | Converts PPTX files to documents. |
PyPDFToDocument | Converts PDF files to documents. |
TikaDocumentConverter | Converts various file types to documents using Apache Tika. |
TextFileToDocument | Converts text files to documents. |
UnstructuredFileConverter | Converts text files and directories to a document. |
XLSXToDocument | Converts Excel files into documents. |
Updated 4 days ago
Related Links
See the parameters details in our API reference: