DocumentationAPI ReferenceTutorialsGitHub Code ExamplesDiscord Community

Join Documents

This node receives Documents from multiple nodes and joins them back together. This allows for the merging of two separate pipeline branches.

Position in a PipelineGenerally used in cases where two separate branches have two different Retrievers whose results need to be amalgamated
InputDocuments
OutputDocuments
ClassesJoinDocuments

Usage

  • To initialize the Node, run:
from haystack.nodes import JoinDocuments, TransformersQueryClassifier

join_documents = JoinDocuments(
    join_mode="concatenate",
    top_k_join=10
)
  • To use JoinDocuments in a Pipeline, run the following code. Here the outputs of the ESRetriever and the DPRRetriever are combined by JoinDocuments.
from haystack.pipelines import Pipeline
from haystack.nodes import JoinDocuments, TransformersQueryClassifier, BM25Retriever, DensePassageRetriever

# Here's how you initialize the JoinDocuments node. Note that before running the Pipeline, you need to initialize all the other nodes as well. join_documents = JoinDocuments(
    join_mode="concatenate",
    top_k_join=10
)

pipe = Pipeline()
pipe.add_node(component=QueryClassifier(), name="TransformersQueryClassifier", inputs=["Query"])
pipe.add_node(component=bm25_retriever, name="BM25Retriever", inputs=["TransformersQueryClassifier.output_1"])
pipe.add_node(component=dpr_retriever, name="DensePassageRetriever", inputs=["TransformersQueryClassifier.output_2"])
pipe.add_node(component=join_documents, name="JoinDocuments",
              inputs=["ESRetriever", "DPRRetriever"])
res = pipe.run(query="What did Einstein work on?")

Related Links