Routers is a group of components that route queries or Documents to other components that can handle them best.
Module haystack_experimental.components.routers.document_type_router
DocumentTypeRouter
Categorizes documents by MIME types based on their metadata.
DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. It supports exact MIME type matches and regex patterns.
MIME types can be extracted directly from document metadata or inferred from file paths using standard or user-supplied MIME type mappings.
Usage example
from haystack_experimental.components.routers import DocumentTypeRouter
from haystack.dataclasses import Document
docs = [
Document(content="Example text", meta={"file_path": "example.txt"}),
Document(content="Another document", meta={"mime_type": "application/pdf"}),
Document(content="Unknown type")
]
router = DocumentTypeRouter(
mime_type_meta_field="mime_type",
file_path_meta_field="file_path",
mime_types=["text/plain", "application/pdf"]
)
result = router.run(documents=docs)
print(result)
Expected output:
{
"text/plain": [Document(...)],
"application/pdf": [Document(...)],
"unclassified": [Document(...)]
}
DocumentTypeRouter.__init__
def __init__(*,
mime_type_meta_field: Optional[str] = None,
file_path_meta_field: Optional[str] = None,
mime_types: List[str],
additional_mimetypes: Optional[Dict[str, str]] = None) -> None
Initialize the DocumentTypeRouter component.
Arguments:
mime_type_meta_field
: Optional name of the metadata field that holds the MIME type.file_path_meta_field
: Optional name of the metadata field that holds the file path. Used to infer the MIME type ifmime_type_meta_field
is not provided or missing in a document.mime_types
: A list of MIME types or regex patterns to classify the input documents. (for example:["text/plain", "audio/x-wav", "image/jpeg"]
).additional_mimetypes
: Optional dictionary mapping MIME types to file extensions to enhance or override the standardmimetypes
module. Useful when working with uncommon or custom file types. For example:{"application/vnd.custom-type": ".custom"}
.
Raises:
ValueError
: Ifmime_types
is empty or if bothmime_type_meta_field
andfile_path_meta_field
are not provided.
DocumentTypeRouter.run
def run(documents: List[Document]) -> Dict[str, List[Document]]
Categorize input documents into groups based on their MIME type.
MIME types can either be directly available in document metadata or derived from file paths using the
standard Python mimetypes
module and custom mappings.
Arguments:
documents
: A list of documents to be categorized.
Returns:
A dictionary where the keys are MIME types (or "unclassified"
) and the values are lists of documents.