DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

Converters

Use various Converters to extract data from files in different formats and cast it into the unified document format. There are several converters available for converting PDFs, images, DOCX files, and more.

ConverterDescription
AzureOCRDocumentConverterConverts PDF (both searchable and image-only), JPEG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, and HTML to documents.
CSVToDocumentConverts CSV files to documents.
DocumentToImageContentExtracts visual data from image or PDF file-based documents and converts them into ImageContent objects.
DOCXToDocumentConvert DOCX files to documents.
HTMLToDocumentConverts HTML files to documents.
ImageFileToDocumentConverts image file references into empty Document objects with associated metadata.
ImageFileToImageContentReads local image files and converts them into ImageContent objects.
JSONConverterConverts JSON files to text documents.
MarkdownToDocumentConverts markdown files to documents.
MSGToDocumentConverts Microsoft Outlook .msg files to documents.
MultiFileConverterConverts CSV, DOCX, HTML, JSON, MD, PPTX, PDF, TXT, and XSLX files to documents.
OpenAPIServiceToFunctionsTransforms OpenAPI service specifications into a format compatible with OpenAI's function calling mechanism.
OutputAdapterHelps the output of one component fit into the input of another.
PDFMinerToDocumentConverts complex PDF files to documents using pdfminer arguments.
PDFToImageContentReads local PDF files and converts them into ImageContent objects.
PPTXToDocumentConverts PPTX files to documents.
PyPDFToDocumentConverts PDF files to documents.
TikaDocumentConverterConverts various file types to documents using Apache Tika.
TextFileToDocumentConverts text files to documents.
UnstructuredFileConverterConverts text files and directories to a document.
XLSXToDocumentConverts Excel files into documents.

Related Links

See the parameters details in our API reference: