DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord


Use this component to convert text files and directories to a Document.

Position in a PipelineBefore PreProcessors, or right at the beginning of an indexing Pipeline
Mandatory Inputs“paths”: a union of lists of paths
Outputs“documents: a list of Documents


UnstructuredFileConverter converts files and directories into Documents using the Unstructured API.

Unstructured provides a series of tools to do ETL for LLMs. The UnstructuredFileConverter calls the Unstructured API that extracts text and other information from a vast range of file formats.


If you plan to use the hosted version of the Unstructured API, set the Unstructured API key as an environment variable UNSTRUCTURED_API_KEY:

export UNSTRUCTURED_API_KEY=your_api_key

On its own

import os
from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter

converter = UnstructuredFileConverter()
documents = = ["a/file/path.pdf", "a/directory/path"])["documents"])

In a Pipeline

import os
from haystack import Pipeline
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter

document_store = InMemoryDocumentStore()

indexing = Pipeline()
indexing.add_component("converter", UnstructuredFileConverter())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer"){"converter": {"paths": ["a/file/path.pdf", "a/directory/path"]}})

Related Links

Check out the API reference in the GitHub repo or in our docs: