DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

JoinDocuments

This node receives Documents from multiple nodes and joins them back together. This allows for the merging of two separate pipeline branches.

Position in a PipelineGenerally used in cases where two separate branches have two different Retrievers whose results need to be amalgamated
InputDocuments
OutputDocuments
ClassesJoinDocuments

Usage

  • To initialize the Node, run:
from haystack.nodes import JoinDocuments, TransformersQueryClassifier

join_documents = JoinDocuments(
    join_mode="concatenate",
    top_k_join=10
)
  • To use JoinDocuments in a Pipeline, run the following code. Here the outputs of the ESRetriever and the DPRRetriever are combined by JoinDocuments.
from haystack.pipelines import Pipeline
from haystack.nodes import JoinDocuments, TransformersQueryClassifier, BM25Retriever, DensePassageRetriever

# Here's how you initialize the JoinDocuments node. Note that before running the Pipeline, you need to initialize all the other nodes as well. join_documents = JoinDocuments(
    join_mode="concatenate",
    top_k_join=10
)

pipe = Pipeline()
pipe.add_node(component=QueryClassifier(), name="TransformersQueryClassifier", inputs=["Query"])
pipe.add_node(component=bm25_retriever, name="BM25Retriever", inputs=["TransformersQueryClassifier.output_1"])
pipe.add_node(component=dpr_retriever, name="DensePassageRetriever", inputs=["TransformersQueryClassifier.output_2"])
pipe.add_node(component=join_documents, name="JoinDocuments",
              inputs=["ESRetriever", "DPRRetriever"])
res = pipe.run(query="What did Einstein work on?")

Related Links