DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Components that join list of different objects

Module document_joiner

DocumentJoiner

@component
class DocumentJoiner()

A component that joins multiple list of Documents into a single list.

It supports different joins modes:

  • concatenate: Keeps the highest scored Document in case of duplicates.
  • merge: Merge a calculate a weighted sum of the scores of duplicate Documents.
  • reciprocal_rank_fusion: Merge and assign scores based on reciprocal rank fusion.

Usage example:

document_store = InMemoryDocumentStore()
p = Pipeline()
p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="bm25_retriever")
p.add_component(
        instance=SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
        name="text_embedder",
    )
p.add_component(instance=InMemoryEmbeddingRetriever(document_store=document_store), name="embedding_retriever")
p.add_component(instance=DocumentJoiner(), name="joiner")
p.connect("bm25_retriever", "joiner")
p.connect("embedding_retriever", "joiner")
p.connect("text_embedder", "embedding_retriever")
query = "What is the capital of France?"
p.run(data={"query": query})

DocumentJoiner.__init__

def __init__(join_mode: str = "concatenate",
             weights: Optional[List[float]] = None,
             top_k: Optional[int] = None,
             sort_by_score: bool = True)

Create an DocumentJoiner component.

Arguments:

  • join_mode: Specifies the join mode to use. Available modes:
  • concatenate
  • merge
  • reciprocal_rank_fusion
  • weights: Weight for each list of Documents received, must have the same length as the number of inputs. If join_mode is concatenate this parameter is ignored.
  • top_k: The maximum number of Documents to return.
  • sort_by_score: If True sorts the Documents by score in descending order. If a Document has no score, it is handled as if its score is -infinity.

DocumentJoiner.run

@component.output_types(documents=List[Document])
def run(documents: Variadic[List[Document]])

Joins multiple lists of Documents into a single list depending on the join_mode parameter.

Arguments:

  • documents: List of list of Documents to be merged.

Returns:

A dictionary with the following keys:

  • documents: Merged list of Documents