Metadata Filtering
This page provides a detailed explanation of how to apply metadata filters at query time.
When you index documents into your Document Store, you can attach metadata to them. One example is the DocumentLanguageClassifier
, which adds the language of the document's content to its metadata. Components like MetadataRouter
can then route documents based on their metadata. Additionally, you can apply filters to queries used with Retrievers to limit the scope of your search based on this metadata and ensure that your Answers come from a specific slice of your data.
Filters
To illustrate how filters work, imagine you have a set of annual reports from various companies. You may want to perform a search on just a specific year and just on a small selection of companies. This can reduce the workload of the Retriever and also ensure that you get more relevant results.
Filters are applied via the filters
argument of the Retriever
class. When working with a pipeline, the filter can be given to Pipeline.run()
, which will then route it to the Retriever
class (see pipelines docs on how to work with a pipeline).
For example, you can supply a filter in the form of a nested dictionary where field
is set to a document metadata field, an operator is set to in
, and the values are a list of accepted values. In the example below, the filter ensures that any returned document has a value of 2019
in the years
metadata field and either BMW
or Mercedes
in the companies
metadata field.
pipeline.run(
data={"retriever": {
"query": "Why did the revenue increase?",
"filters": { "operator": "AND",
"conditions": [
{"field": "meta.years", "operator": "==", "value": "2019"},
{"field": "meta.companies", "operator": "in", "value": ["BMW", "Mercedes"]},
]
}
}
}
)
Filtering Logic
Technically speaking, filters are defined as nested dictionaries that can be of two types: Comparison or Logic.
Comparison
Comparison dictionaries must contain the following keys:
-field
-operator
-value
The field
value in Comparison dictionaries must be the name of one of the meta fields of a document, such as meta.years
.
The operator
value in Comparison dictionaries must be one of the following:
-==
-!=
->
->=
-<
-<=
-in
-not in
The field value
takes a single value or (in the case of "in" and “not in”) a list of values as value.
Logic
Logic dictionaries must contain the following keys:
-operator
-conditions
The conditions
key must be a list of dictionaries, either of type Comparison or Logic.
The operator
values in Logic dictionaries must be one of the following:
-NOT
-OR
-AND
In the Haystack code base, the filtering logic is defined in the DocumentStore protocol.
Updated 5 months ago