MultiRetriever
Runs multiple text retrievers in parallel and combines their deduplicated results.
MultiRetriever is experimental and may change or be removed in future releases without prior deprecation notice. An ExperimentalWarning is printed when initializing this component.
| Most common position in a pipeline | After query input, before a ChatPromptBuilder in RAG pipelines |
| Mandatory init variables | retrievers: A dictionary mapping names to text retrievers (implementing the TextRetriever protocol) |
| Mandatory run variables | query: A query string |
| Output variables | documents: A deduplicated list of retrieved documents |
| API reference | Retrievers |
| GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/multi_retriever.py |
| Package name | haystack-ai |
Overview
MultiRetriever composes any number of text retrievers into a single component. All retrievers are queried in parallel using a thread pool, and their results are deduplicated before being returned.
The component:
- Queries all retrievers concurrently for better performance
- Automatically deduplicates results across retrievers
- Supports selectively enabling retrievers at runtime via
active_retrievers
All retrievers passed to MultiRetriever must implement the TextRetriever protocol — their run method must accept a text query, filters, and top_k. Use TextEmbeddingRetriever to wrap an embedding-based retriever so it can be used with this component.
Usage
On its own
This example sets up a MultiRetriever combining a BM25 retriever and an embedding-based retriever (wrapped with TextEmbeddingRetriever). Both are queried in parallel and the deduplicated results are returned.
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.writers import DocumentWriter
documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
]
doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])
retriever = MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
top_k=3,
)
result = retriever.run(query="green energy sources")
for doc in result["documents"]:
print(doc.content)
Selecting retrievers at runtime
Use the active_retrievers parameter to run only a subset of retrievers. Names must match the keys in the retrievers dictionary. Building on the example above:
# Run only the BM25 retriever
result = retriever.run(query="green energy sources", active_retrievers=["bm25"])
for doc in result["documents"]:
print(doc.content)
In a RAG pipeline
This RAG pipeline uses MultiRetriever to combine BM25 and embedding retrieval before generating an answer with an LLM.
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.builders import ChatPromptBuilder
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.retrievers import (
InMemoryBM25Retriever,
InMemoryEmbeddingRetriever,
)
from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.dataclasses import ChatMessage
documents = [
Document(
content="Renewable energy is energy that is collected from renewable resources.",
),
Document(
content="Solar energy is a type of green energy that is harnessed from the sun.",
),
Document(
content="Wind energy is another type of green energy that is generated by wind turbines.",
),
]
doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
)
doc_writer = DocumentWriter(document_store=doc_store, policy=DuplicatePolicy.SKIP)
doc_writer.run(documents=doc_embedder.run(documents)["documents"])
prompt_template = [
ChatMessage.from_system(
"You are a helpful assistant that answers questions based on the provided documents.",
),
ChatMessage.from_user(
"Given these documents, answer the question.\nDocuments:\n"
"{% for doc in documents %}{{ doc.content }}\n{% endfor %}\n"
"Question: {{ question }}",
),
]
pipeline = Pipeline()
pipeline.add_component(
"retriever",
MultiRetriever(
retrievers={
"bm25": InMemoryBM25Retriever(document_store=doc_store),
"embedding": TextEmbeddingRetriever(
retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
text_embedder=SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
),
),
},
top_k=3,
),
)
pipeline.add_component(
"prompt_builder",
ChatPromptBuilder(
template=prompt_template,
required_variables=["documents", "question"],
),
)
pipeline.add_component("llm", OpenAIChatGenerator())
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.messages")
result = pipeline.run(
{
"retriever": {"query": "green energy sources"},
"prompt_builder": {"question": "What types of green energy exist?"},
},
)
print(result["llm"]["replies"][0].text)