MetadataRouter
Use this component to route documents to different output connections based on the content of their metadata fields.
Most common position in a pipeline | After components that classify documents, such as DocumentLanguageClassifier |
Mandatory init variables | "rules": A dictionary with metadata routing rules (see our API Reference for examples) |
Mandatory run variables | “documents”: A list of documents |
Output variables | “unmatched”: A list of documents not matching any rule “name_of_the_rule”: A list of documents objects matching custom rules. There's one output per one rule you define. Each of these outputs is a list of documents. |
API reference | Routers |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/metadata_router.py |
Overview
MetadataRouter
routes documents to different outputs based on the documents’ metadata. You initialize it with rules
defining the names of the outputs and filters to match documents to one of the connections. The filters follow the same syntax as filters in Document Stores. If a document matches multiple filters, it is sent to multiple outputs. documents that do not match any rule go to an output connection named unmatched
.
In pipelines, this component is most useful after a Classifier (such as the DocumentLanguageClassifier
) that adds the classification results to the documents’ metadata.
This component has no default rules. If you don't define any rules when initializing the component, it routes all documents to the unmatched
output.
Usage
On its own
Below is an example that uses the MetadataRouter
to filter out documents based on their metadata. We initialize the router by setting a rule to pass on all documents with language
set to en
in their metadata to an output connection called en
. documents that don't match this rule go to an output connection named unmatched
.
from haystack import Document
from haystack.components.routers import MetadataRouter
docs = [Document(content="Paris is the capital of France.", meta={"language": "en"}), Document(content="Berlin ist die Haupststadt von Deutschland.", meta={"language": "de"})]
router = MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}})
router.run(documents=docs)
In a pipeline
Below is an example of an indexing pipeline that converts text files to documents and uses the DocumentLanguageClassifier
to detect the language of the text and add it to the documents’ metadata. It then uses the MetadataRouter
to forward only English language documents to the DocumentWriter
. Documents of other languages will not be added to the DocumentStore
.
from haystack import Pipeline
from haystack.components.file_converters import TextFileToDocument
from haystack.components.classifiers import DocumentLanguageClassifier
from haystack.components.routers import MetadataRouter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
p = Pipeline()
p.add_component(instance=TextFileToDocument(), name="text_file_converter")
p.add_component(instance=DocumentLanguageClassifier(), name="language_classifier")
p.add_component(
instance=MetadataRouter(rules={"en": {"field": "meta.language", "operator": "==", "value": "en"}}), name="router"
)
p.add_component(instance=DocumentWriter(document_store=document_store), name="writer")
p.connect("text_file_converter.documents", "language_classifier.documents")
p.connect("language_classifier.documents", "router.documents")
p.connect("router.en", "writer.documents")
p.run({"text_file_converter": {"sources": ["english-file-will-be-added.txt", "german-file-will-not-be-added.txt"]}})
Updated 5 months ago