DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

RouteDocuments

This Node routes Documents to different branches of your pipeline based on their content_type or a metadata field. This page explains when it makes sense to use RouteDocuments and how to do it.

RouteDocuments is a decision node. By default, it routes Documents of content_type text and table to different branches of your pipeline. You can also base this routing on metadata instead of on content_type. To do this, specify the parameter metadata_values. This node is handy if you have different types of data, for example, tables and text. You can then use it to route each document type to a Reader trained on it.

Position in a PipelineBetween Retriever and Reader
InputDocuments
OutputDocuments
ClassesRouteDocuments

Usage

To initialize RouteDocuments so that it routes documents based on their content type (text vs. table), run:

route_documents = RouteDocuments()

To initialize RouteDocuments so that it routes Documents based on a metadata field, run:

route_documents = RouteDocuments(
    split_by="language",
    metadata_values=["de", "en", "es"]
)

To use RouteDocuments in a pipeline, run:

# Define the Retriever:
from haystack.nodes.retriever import EmbeddingRetriever

retriever = EmbeddingRetriever(document_store=document_store, embedding_model="deepset/all-mpnet-base-v2-table")

# Define a table reader and a text reader. RouteDocuments will route relevant documents to the corresponding reader.
# In a question answering pipeline, it makes sense to use RouteDocuments with JoinAnswers:

from haystack.nodes import FARMReader, TableReader, RouteDocuments, JoinAnswers

text_reader = FARMReader("deepset/roberta-base-squad2")
# In order to get meaningful scores from the TableReader, use "deepset/tapas-large-nq-hn-reader" or
# "deepset/tapas-large-nq-reader" as TableReader models. The disadvantage of these models is, however,
# that they are not capable of doing aggregations over multiple table cells. table_reader = TableReader("deepset/tapas-large-nq-hn-reader")
route_documents = RouteDocuments()
join_answers = JoinAnswers()

# Combine your nodes into a pipeline:

from haystack import Pipeline

text_table_qa_pipeline = Pipeline()
text_table_qa_pipeline.add_node(component=retriever, name="EmbeddingRetriever", inputs=["Query"])
text_table_qa_pipeline.add_node(component=route_documents, name="RouteDocuments", inputs=["EmbeddingRetriever"])
text_table_qa_pipeline.add_node(component=text_reader, name="TextReader", inputs=["RouteDocuments.output_1"])
text_table_qa_pipeline.add_node(component=table_reader, name="TableReader", inputs=["RouteDocuments.output_2"])
text_table_qa_pipeline.add_node(component=join_answers, name="JoinAnswers", inputs=["TextReader", "TableReader"])

Related Links