Version: 2.28-unstable

MultiFilterRetriever

Use this Retriever with any Document Store to retrieve Documents matching multiple filters in parallel.


Most common position in a pipeline	At the beginning of a Pipeline
Mandatory init variables	`document_store`: An instance of a Document Store
Mandatory run variables	`filters`: A list of filter dictionaries in the same syntax supported by the Document Stores
Output variables	`documents`: All the documents that match at least one of the provided filters, deduplicated
API reference	Retrievers
GitHub link	https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/multi_filter_retriever.py

Overview

MultiFilterRetriever is an extension of FilterRetriever that accepts a list of filter dictionaries and runs each filter against the Document Store in parallel. Results from all filters are merged and deduplicated before being returned.

Use it when you need to retrieve Documents matching different criteria in a single pipeline step — for example, fetching English and German documents at the same time, or combining results from several independent filter conditions.

Pay attention when using MultiFilterRetriever on a Document Store that contains many Documents, as each filter can return a large number of results. Passing an empty filter list returns no documents.

MultiFilterRetriever does not score or rank Documents. If you need to rank the results by similarity to a query, consider using Ranker components after retrieval.

Usage

On its own

python

from haystack import Document
from haystack.components.retrievers import MultiFilterRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

documents = [
    Document(content="Python is a popular programming language", meta={"lang": "en"}),
    Document(content="python ist eine beliebte Programmiersprache", meta={"lang": "de"}),
]

document_store = InMemoryDocumentStore()
DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP).run(documents=documents)

retriever = MultiFilterRetriever(document_store=document_store)

filters = [
    {"field": "meta.lang", "operator": "==", "value": "en"},
    {"field": "meta.lang", "operator": "==", "value": "de"},
]

result = retriever.run(filters=filters)
for doc in result["documents"]:
    print(doc.content)

In a RAG pipeline

Set your OPENAI_API_KEY as an environment variable and then run the following code:

python

import os

from haystack import Document, Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers import MultiFilterRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy

document_store = InMemoryDocumentStore()
documents = [
  Document(content="Mark lives in Berlin.", meta={"year": 2018}),
  Document(content="Mark lives in Paris.", meta={"year": 2021}),
  Document(content="Mark is Danish.", meta={"year": 2021}),
  Document(content="Mark lives in New York.", meta={"year": 2023}),
]
DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP).run(documents=documents)

prompt_template = """
    Given these documents, answer the question.\nDocuments:
    {% for doc in documents %}
        {{ doc.content }}
    {% endfor %}

    \nQuestion: {{question}}
    \nAnswer:
    """

rag_pipeline = Pipeline()
rag_pipeline.add_component(name="retriever", instance=MultiFilterRetriever(document_store=document_store))
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")

result = rag_pipeline.run(
    {
        "retriever": {
            "filters": [
                {"field": "meta.year", "operator": "==", "value": 2021},
                {"field": "meta.year", "operator": "==", "value": 2023},
            ]
        },
        "prompt_builder": {"question": "Where does Mark live?"},
    }
)
print(result["llm"]["replies"][0])

Here's an example output you might get:

According to the provided documents, Mark lives in New York.

Overview​

Usage​

On its own​

In a RAG pipeline​

Overview

Usage

On its own

In a RAG pipeline