DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

MetadataRouter

Use this component to route documents to different output connections based on the content of their metadata fields.

NameMetadataRouter
Folder Path/routers/
Position in a PipelineAfter components that classify Documents, such as DocumentLanguageClassifier.
Input Names: Input Typesβ€œdocuments”: List of Document objects
Output Names: Output Typesβ€œunmatched”: List of Document objects not matching any rule

β€œname_of_the_rule”: List of Document objects matching custom rules. There's one output per one rule you define. Each of these outputs is a list of Document objects. For example, the output could be:
"type_article": List of Documents

Overview

MetadataRouter routes Documents to different outputs based on the Documents’ metadata. You initialize it with rules defining the names of the outputs and filters to match Documents to one of the connections. The filters follow the same syntax as filters in DocumentStores. If a Document matches multiple filters, it is sent to multiple outputs. Documents that do not match any rule go to an output connection named unmatched.

In pipelines, this component is most useful after a Classifier (such as the DocumentLanguageClassifier) that adds the classification results to the Documents’ metadata.

This component has no default rules. If you don't define any rules when initializing the component, it routes all Documents to the unmatched output.

Usage

On its own

Below is an example that uses the MetadataRouter to filter out Documents based on their metadata. We initialize the router by setting a rule to pass on all Documents with language set to en in their metadata to an output connection called en. Documents that don't match this rule go to an output connection named unmatched.

from haystack import Document
from haystack.components.routers import MetadataRouter

docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})]
router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}})
router.run(documents=docs)

In a Pipeline

Below is an example of an indexing pipeline that converts text files to Documents and uses the DocumentLanguageClassifier to detect the language of the text and add it to the Documents’ metadata. It then uses the MetadataRouter to forward only English language Documents to the DocumentWriter. Documents of other languages will not be added to the DocumentStore.

from haystack import Pipeline
from haystack.components.file_converters import TextFileToDocument
from haystack.components.classifiers import DocumentLanguageClassifier
from haystack.components.routers import MetadataRouter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
p = Pipeline()
p.add_component(instance=TextFileToDocument(), name="text_file_converter")
p.add_component(instance=DocumentLanguageClassifier(), name="language_classifier")
p.add_component(
    instance=MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}), name="router"
)
p.add_component(instance=DocumentWriter(document_store=document_store), name="writer")
p.connect("text_file_converter.documents", "language_classifier.documents")
p.connect("language_classifier.documents", "router.documents")
p.connect("router.en", "writer.documents")
p.run({"text_file_converter": {"sources": ["english-file-will-be-added.txt", "german-file-will-not-be-added.txt"]}})

Related Links

See the parameters details in our API reference: