Skip to main content
Version: 2.29-unstable

MultiRetriever

Runs multiple text retrievers in parallel and combines their deduplicated results.

Experimental

MultiRetriever is experimental and may change or be removed in future releases without prior deprecation notice. An ExperimentalWarning is printed when initializing this component.

Most common position in a pipelineAfter query input, before a ChatPromptBuilder in RAG pipelines
Mandatory init variablesretrievers: A dictionary mapping names to text retrievers (implementing the TextRetriever protocol)
Mandatory run variablesquery: A query string
Output variablesdocuments: A deduplicated list of retrieved documents
API referenceRetrievers
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/multi_retriever.py
Package namehaystack-ai

Overview

MultiRetriever composes any number of text retrievers into a single component. All retrievers are queried in parallel using a thread pool, and their results are deduplicated before being returned.

The component:

  • Queries all retrievers concurrently for better performance
  • Automatically deduplicates results across retrievers
  • Supports selectively enabling retrievers at runtime via active_retrievers

All retrievers passed to MultiRetriever must implement the TextRetriever protocol — their run method must accept a text query, filters, and top_k. Use TextEmbeddingRetriever to wrap an embedding-based retriever so it can be used with this component.

Usage

On its own

This example sets up a MultiRetriever combining a BM25 retriever and an embedding-based retriever (wrapped with TextEmbeddingRetriever). Both are queried in parallel and the deduplicated results are returned.

python
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.writers import DocumentWriter

documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
]

doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])

retriever = MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
top_k=3,
)

result = retriever.run(query="green energy sources")
for doc in result["documents"]:
print(doc.content)

Selecting retrievers at runtime

Use the active_retrievers parameter to run only a subset of retrievers. Names must match the keys in the retrievers dictionary. Building on the example above:

python
# Run only the BM25 retriever
result = retriever.run(query="green energy sources", active_retrievers=["bm25"])
for doc in result["documents"]:
print(doc.content)

In a RAG pipeline

This RAG pipeline uses MultiRetriever to combine BM25 and embedding retrieval before generating an answer with an LLM.

python
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.builders import ChatPromptBuilder
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.dataclasses import ChatMessage

documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
]

doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])

prompt_template = [
ChatMessage.from_system(
"You are a helpful assistant that answers questions based on the provided documents.",
),
ChatMessage.from_user(
"Given these documents, answer the question.\nDocuments:\n"
"{% for doc in documents %}{{ doc.content }}\n{% endfor %}\n"
"Question: {{ question }}",
),
]

pipeline = Pipeline()
pipeline.add_component(
"retriever",
MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
top_k=3,
),
)
pipeline.add_component(
"prompt_builder",
ChatPromptBuilder(
template=prompt_template,
required_variables=["documents", "question"],
),
)
pipeline.add_component("llm", OpenAIChatGenerator())

pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")

result = pipeline.run(
{
"retriever": {"query": "green energy sources"},
"prompt_builder": {"question": "What types of green energy exist?"},
},
)
print(result["llm"]["replies"][0].text)