DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Routers

Routers is a group of components that route queries or Documents to other components that can handle them best.

Module haystack_experimental.components.routers.document_length_router

DocumentLengthRouter

Categorizes documents based on the length of the content field and routes them to the appropriate output.

A common use case for DocumentLengthRouter is handling documents obtained from PDFs that contain non-text content, such as scanned pages or images. This component can detect empty or low-content documents and route them to components that perform OCR, generate captions, or compute image embeddings.

Usage example

from haystack_experimental.components.routers import DocumentLengthRouter
from haystack.dataclasses import Document

docs = [
    Document(content="Short"),
    Document(content="Long document "*20),
]

router = DocumentLengthRouter(threshold=10)

result = router.run(documents=docs)
print(result)

# {
#     "short_documents": [Document(content="Short", ...)],
#     "long_documents": [Document(content="Long document ...", ...)],
# }

<a id="haystack_experimental.components.routers.document_length_router.DocumentLengthRouter.__init__"></a>

#### DocumentLengthRouter.\_\_init\_\_

```python
def __init__(*, threshold: int = 10) -> None

Initialize the DocumentLengthRouter component.

Arguments:

  • threshold: The threshold for the number of characters in the document content field. Documents where content is None or whose character count is less than or equal to the threshold will be routed to the short_documents output. Otherwise, they will be routed to the long_documents output. To route only documents with None content to short_documents, set the threshold to a negative number.

DocumentLengthRouter.run

@component.output_types(
    short_documents=List[Document],
    long_documents=List[Document],
)
def run(documents: List[Document]) -> Dict[str, List[Document]]

Categorize input documents into groups based on the length of the content field.

Arguments:

  • documents: A list of documents to be categorized.

Returns:

A dictionary with the following keys:

  • short_documents: A list of documents where content is None or the length of content is less than or equal to the threshold.
  • long_documents: A list of documents where the length of content is greater than the threshold.

Module haystack_experimental.components.routers.document_type_router

DocumentTypeRouter

Categorizes documents by MIME types based on their metadata.

DocumentTypeRouter is used to dynamically route documents within a pipeline based on their MIME types. It supports exact MIME type matches and regex patterns.

MIME types can be extracted directly from document metadata or inferred from file paths using standard or user-supplied MIME type mappings.

Usage example

from haystack_experimental.components.routers import DocumentTypeRouter
from haystack.dataclasses import Document

docs = [
    Document(content="Example text", meta={"file_path": "example.txt"}),
    Document(content="Another document", meta={"mime_type": "application/pdf"}),
    Document(content="Unknown type")
]

router = DocumentTypeRouter(
    mime_type_meta_field="mime_type",
    file_path_meta_field="file_path",
    mime_types=["text/plain", "application/pdf"]
)

result = router.run(documents=docs)
print(result)

Expected output:

{
    "text/plain": [Document(...)],
    "application/pdf": [Document(...)],
    "unclassified": [Document(...)]
}

DocumentTypeRouter.__init__

def __init__(*,
             mime_type_meta_field: Optional[str] = None,
             file_path_meta_field: Optional[str] = None,
             mime_types: List[str],
             additional_mimetypes: Optional[Dict[str, str]] = None) -> None

Initialize the DocumentTypeRouter component.

Arguments:

  • mime_type_meta_field: Optional name of the metadata field that holds the MIME type.
  • file_path_meta_field: Optional name of the metadata field that holds the file path. Used to infer the MIME type if mime_type_meta_field is not provided or missing in a document.
  • mime_types: A list of MIME types or regex patterns to classify the input documents. (for example: ["text/plain", "audio/x-wav", "image/jpeg"]).
  • additional_mimetypes: Optional dictionary mapping MIME types to file extensions to enhance or override the standard mimetypes module. Useful when working with uncommon or custom file types. For example: {"application/vnd.custom-type": ".custom"}.

Raises:

  • ValueError: If mime_types is empty or if both mime_type_meta_field and file_path_meta_field are not provided.

DocumentTypeRouter.run

def run(documents: List[Document]) -> Dict[str, List[Document]]

Categorize input documents into groups based on their MIME type.

MIME types can either be directly available in document metadata or derived from file paths using the standard Python mimetypes module and custom mappings.

Arguments:

  • documents: A list of documents to be categorized.

Returns:

A dictionary where the keys are MIME types (or "unclassified") and the values are lists of documents.