Metadata Filtering
This page provides a detailed explanation of how to apply metadata filters at query time.
When you index Documents into your Document Store, you can attach metadata to them. One example is the DocumentLanguageClassifier
, which adds the language of the Document's content to its metadata. Components like MetadataRouter
can then route Documents based on their metadata. Additionally, you can apply filters to queries used with Retrievers to limit the scope of your search based on this metadata and ensure that your Answers come from a specific slice of your data.
Filters
To illustrate how filters work, imagine you have a set of annual reports from various companies. You may want to perform a search on just a specific year and just on a small selection of companies. This can reduce the workload of the Retriever and also ensure that you get more relevant results.
Filters are applied via theΒ filters
Β argument of theΒ Retriever
Β class. When working with a Pipeline, the filter can be given toΒ Pipeline.run()
, which will then route it to theΒ Retriever
Β class (seeΒ PipelinesΒ docs on how to work with a Pipeline).
For example, you can supply a filter in the form of a nested dictionary where field
is set to a Document metadata field, an operator is set to in
, and the values are a list of accepted values. In the example below, the filter ensures that any returned Document has a value ofΒ 2019
Β in theΒ years
Β metadata field and eitherΒ BMW
Β orΒ Mercedes
Β in theΒ companies
Β metadata field.
pipeline.run(
data={"retriever": {
"query": "Why did the revenue increase?",
"filters": { "operator": "AND",
"conditions": [
{"field": "meta.years", "operator": "==", "value": "2019"},
{"field": "meta.companies", "operator": "in", "value": ["BMW", "Mercedes"]},
]
}
}
}
)
Filtering Logic
Technically speaking, filters are defined as nested dictionaries that can be of two types: Comparison or Logic.
Comparison
Comparison dictionaries must contain the following keys:
-field
-operator
-value
The field
value in Comparison dictionaries must be the name of one of the meta fields of a document, such as meta.years
.
The operator
value in Comparison dictionaries must be one of the following:
-==
-!=
->
->=
-<
-<=
-in
-not in
The field value
takes a single value or (in the case of "in" and βnot inβ) a list of values as value.
Logic
Logic dictionaries must contain the following keys:
-operator
-conditions
The conditions
key must be a list of dictionaries, either of type Comparison or Logic.
The operator
values in Logic dictionaries must be one of the following:
-NOT
-OR
-AND
In the Haystack code base, the filtering logic is defined in the DocumentStore protocol.
Updated 3 months ago