Components that join list of different objects
Module document_joiner
DocumentJoiner
@component
class DocumentJoiner()
A component that joins multiple list of Documents into a single list.
It supports different joins modes:
- concatenate: Keeps the highest scored Document in case of duplicates.
- merge: Merge a calculate a weighted sum of the scores of duplicate Documents.
- reciprocal_rank_fusion: Merge and assign scores based on reciprocal rank fusion.
Usage example:
document_store = InMemoryDocumentStore()
p = Pipeline()
p.add_component(instance=InMemoryBM25Retriever(document_store=document_store), name="bm25_retriever")
p.add_component(
instance=SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
name="text_embedder",
)
p.add_component(instance=InMemoryEmbeddingRetriever(document_store=document_store), name="embedding_retriever")
p.add_component(instance=DocumentJoiner(), name="joiner")
p.connect("bm25_retriever", "joiner")
p.connect("embedding_retriever", "joiner")
p.connect("text_embedder", "embedding_retriever")
query = "What is the capital of France?"
p.run(data={"query": query})
DocumentJoiner.__init__
def __init__(join_mode: str = "concatenate",
weights: Optional[List[float]] = None,
top_k: Optional[int] = None,
sort_by_score: bool = True)
Create an DocumentJoiner component.
Arguments:
join_mode
: Specifies the join mode to use. Available modes:concatenate
merge
reciprocal_rank_fusion
weights
: Weight for each list of Documents received, must have the same length as the number of inputs. Ifjoin_mode
isconcatenate
this parameter is ignored.top_k
: The maximum number of Documents to return.sort_by_score
: If True sorts the Documents by score in descending order. If a Document has no score, it is handled as if its score is -infinity.
DocumentJoiner.run
@component.output_types(documents=List[Document])
def run(documents: Variadic[List[Document]])
Joins multiple lists of Documents into a single list depending on the join_mode
parameter.
Arguments:
documents
: List of list of Documents to be merged.
Returns:
A dictionary with the following keys:
documents
: Merged list of Documents