DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

Converters

Use various Converters to extract data from files in different formats and cast it into the unified document format. There are several converters available for converting PDFs, images, DOCX files, and more.

ConverterDescription
AzureOCRDocumentConverterConverts PDF (both searchable and image-only), JPEG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, and HTML to documents.
HTMLToDocumentConverts HTML files to documents.
MarkdownToDocumentConverts markdown files to documents.
OpenAPIServiceToFunctionsTransforms OpenAPI service specifications into a format compatible with OpenAI's function calling mechanism.
OutputAdapterHelps the output of one component fit into the input of another.
PDFMinerToDocumentConverts complex PDF files to documents using pdfminer arguments.
PyPDFToDocumentConverts PDF files to documents.
TikaDocumentConverterConverts various file types to documents using Apache Tika.
TextFileToDocumentConverts text files to documents.
UnstructuredFileConverterConverts text files and directories to a document.

Related Links

See the parameters details in our API reference: