MultiFilterRetriever
Use this Retriever with any Document Store to retrieve Documents matching multiple filters in parallel.
| Most common position in a pipeline | At the beginning of a Pipeline |
| Mandatory init variables | document_store: An instance of a Document Store |
| Mandatory run variables | filters: A list of filter dictionaries in the same syntax supported by the Document Stores |
| Output variables | documents: All the documents that match at least one of the provided filters, deduplicated |
| API reference | Retrievers |
| GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/multi_filter_retriever.py |
Overview
MultiFilterRetriever is an extension of FilterRetriever that accepts a list of filter dictionaries and runs each filter against the Document Store in parallel. Results from all filters are merged and deduplicated before being returned.
Use it when you need to retrieve Documents matching different criteria in a single pipeline step — for example, fetching English and German documents at the same time, or combining results from several independent filter conditions.
Pay attention when using MultiFilterRetriever on a Document Store that contains many Documents, as each filter can return a large number of results. Passing an empty filter list returns no documents.
MultiFilterRetriever does not score or rank Documents. If you need to rank the results by similarity to a query, consider using Ranker components after retrieval.
Usage
On its own
from haystack import Document
from haystack.components.retrievers import MultiFilterRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
documents = [
Document(content="Python is a popular programming language", meta={"lang": "en"}),
Document(content="python ist eine beliebte Programmiersprache", meta={"lang": "de"}),
]
document_store = InMemoryDocumentStore()
DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP).run(documents=documents)
retriever = MultiFilterRetriever(document_store=document_store)
filters = [
{"field": "meta.lang", "operator": "==", "value": "en"},
{"field": "meta.lang", "operator": "==", "value": "de"},
]
result = retriever.run(filters=filters)
for doc in result["documents"]:
print(doc.content)
In a RAG pipeline
Set your OPENAI_API_KEY as an environment variable and then run the following code:
import os
from haystack import Document, Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers import MultiFilterRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy
document_store = InMemoryDocumentStore()
documents = [
Document(content="Mark lives in Berlin.", meta={"year": 2018}),
Document(content="Mark lives in Paris.", meta={"year": 2021}),
Document(content="Mark is Danish.", meta={"year": 2021}),
Document(content="Mark lives in New York.", meta={"year": 2023}),
]
DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP).run(documents=documents)
prompt_template = """
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
\nQuestion: {{question}}
\nAnswer:
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=MultiFilterRetriever(document_store=document_store))
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
result = rag_pipeline.run(
{
"retriever": {
"filters": [
{"field": "meta.year", "operator": "==", "value": 2021},
{"field": "meta.year", "operator": "==", "value": 2023},
]
},
"prompt_builder": {"question": "Where does Mark live?"},
}
)
print(result["llm"]["replies"][0])
Here's an example output you might get: