DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Metadata Filtering

This page provides a detailed explanation of how to apply metadata filters at query time.

When you index Documents into your Document Store, you can attach metadata to them. One example is the DocumentLanguageClassifier, which adds the language of the Document's content to its metadata. Components like MetadataRouter can then route Documents based on their metadata. Additionally, you can apply filters to queries used with Retrievers to limit the scope of your search based on this metadata and ensure that your Answers come from a specific slice of your data.

Filters

To illustrate how filters work, imagine you have a set of annual reports from various companies. You may want to perform a search on just a specific year and just on a small selection of companies. This can reduce the workload of the Retriever and also ensure that you get more relevant results.

Filters are applied via theΒ filtersΒ argument of theΒ RetrieverΒ class. When working with a Pipeline, the filter can be given toΒ Pipeline.run(), which will then route it to theΒ RetrieverΒ class (seeΒ PipelinesΒ docs on how to work with a Pipeline).

For example, you can supply a filter in the form of a nested dictionary where field is set to a Document metadata field, an operator is set to in, and the values are a list of accepted values. In the example below, the filter ensures that any returned Document has a value ofΒ 2019Β in theΒ yearsΒ metadata field and eitherΒ BMWΒ orΒ MercedesΒ in theΒ companiesΒ metadata field.

pipeline.run(
  data={"retriever": {
    		"query": "Why did the revenue increase?",
    		"filters": { "operator": "AND",
      			"conditions": [
        			{"field": "meta.years", "operator": "==", "value": "2019"},
        			{"field": "meta.companies", "operator": "in", "value": ["BMW", "Mercedes"]},
      					]
    			   }
  		      }
       }
)

Filtering Logic

Technically speaking, filters are defined as nested dictionaries that can be of two types: Comparison or Logic.

Comparison

Comparison dictionaries must contain the following keys:

-field

-operator

-value

The field value in Comparison dictionaries must be the name of one of the meta fields of a document, such as meta.years.

The operator value in Comparison dictionaries must be one of the following:

-==

-!=

->

->=

-<

-<=

-in

-not in

The field value takes a single value or (in the case of "in" and β€œnot in”) a list of values as value.

Logic

Logic dictionaries must contain the following keys:

-operator

-conditions

The conditions key must be a list of dictionaries, either of type Comparison or Logic.

The operator values in Logic dictionaries must be one of the following:

-NOT

-OR

-AND

In the Haystack code base, the filtering logic is defined in the DocumentStore protocol.