Module haystack_experimental.components.routers.document_type_router

DocumentTypeRouter

Categorizes documents by MIME types based on their metadata.

DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. It supports exact MIME type matches and regex patterns.

MIME types can be extracted directly from document metadata or inferred from file paths using standard or user-supplied MIME type mappings.

Usage example

from haystack_experimental.components.routers import DocumentTypeRouter
from haystack.dataclasses import Document

docs = [
    Document(content="Example text", meta={"file_path": "example.txt"}),
    Document(content="Another document", meta={"mime_type": "application/pdf"}),
    Document(content="Unknown type")
]

router = DocumentTypeRouter(
    mime_type_meta_field="mime_type",
    file_path_meta_field="file_path",
    mime_types=["text/plain", "application/pdf"]
)

result = router.run(documents=docs)
print(result)

Expected output:

{
    "text/plain": [Document(...)],
    "application/pdf": [Document(...)],
    "unclassified": [Document(...)]
}

DocumentTypeRouter.init

def __init__(*,
             mime_type_meta_field: Optional[str] = None,
             file_path_meta_field: Optional[str] = None,
             mime_types: List[str],
             additional_mimetypes: Optional[Dict[str, str]] = None) -> None

Initialize the DocumentTypeRouter component.

Arguments:

mime_type_meta_field: Optional name of the metadata field that holds the MIME type.
file_path_meta_field: Optional name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document.
mime_types: A list of MIME types or regex patterns to classify the input documents. (for example: ["text/plain", "audio/x-wav", "image/jpeg"]).
additional_mimetypes: Optional dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful when working with uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}.

Raises:

ValueError: If mime_types is empty or if both mime_type_meta_field and file_path_meta_field are not provided.

DocumentTypeRouter.run

def run(documents: List[Document]) -> Dict[str, List[Document]]

Categorize input documents into groups based on their MIME type.

MIME types can either be directly available in document metadata or derived from file paths using the standard Python mimetypes module and custom mappings.

Arguments:

documents: A list of documents to be categorized.

Returns:

A dictionary where the keys are MIME types (or "unclassified") and the values are lists of documents.

Module haystack_experimental.components.routers.document_type_router

DocumentTypeRouter

Usage example

DocumentTypeRouter.__init__

DocumentTypeRouter.run

DocumentTypeRouter.init