DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

MetaFieldRanker

MetaFieldRanker ranks Documents based on the value of their meta field you specify. It's a lightweight Ranker that can improve your pipeline's results without slowing it down.

Most common position in a pipelineIn a query pipeline, after a component that returns a list of documents, such as a Retriever
Mandatory init variables"meta_field": The name of the meta field to rank by
Mandatory run variables“documents”: A list of documents

”top_k”: The maximum number of documents to return. If not provided, returns all documents it received.
Output variables“documents”: A list of documents
API referenceRankers
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/meta_field.py

Overview

MetaFieldRanker sorts documents based on the value of a specific meta field in descending or ascending order. This means the returned list of Document objects are arranged in a selected order, with string values sorted alphabetically or in reverse (for example, Tokyo, Paris, Berlin).

MetaFieldRanker comes with the optional parameters weight and ranking_mode you can use to combine a document’s score assigned by the Retriever and the value of its meta field for the ranking. The weight parameter lets you balance the importance of the Document's content and the meta field in the ranking process. The ranking_mode parameter defines how the scores from the Retriever and the Ranker are combined.

This Ranker is useful in query pipelines, like retrieval-augmented generation (RAG) pipelines or document search pipelines. It ensures the documents are ordered by their meta field value. You can also use it after a Retriever (such as the InMemoryEmbeddingRetriever) to combine the Retriever’s score with a document’s meta value for improved ranking.

By default, MetaFieldRanker sorts documents only based on the meta field. You can adjust this by setting the weight to less than 1 when initializing this component. For more details on different initialization settings, check out the API reference for this component.

Usage

On its own

You can use this Ranker outside of a pipeline to sort documents.

This example uses the MetaFieldRanker to rank two simple documents. When running the Ranker, you pass the query, provide the documents and set the number of documents to rank using the top_k parameter.

from haystack import Document
from haystack.components.rankers import MetaFieldRanker

docs = [Document(content="Paris", meta={"rating": 1.3}), Document(content="Berlin", meta={"rating": 0.7})]

ranker = MetaFieldRanker(meta_field="rating")

ranker.run(query="City in France", documents=docs, top_k=1)

In a pipeline

Below is an example of a pipeline that retrieves documents from an InMemoryDocumentStore based on keyword search (using InMemoryBM25Retriever). It then uses the MetaFieldRanker to rank the retrieved documents based on the meta field rating, using the Ranker's default settings:

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import MetaFieldRanker

docs = [Document(content="Paris", meta={"rating": 1.3}),
        Document(content="Berlin", meta={"rating": 0.7}),
        Document(content="Barcelona", meta={"rating": 2.1})]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = MetaFieldRanker(meta_field="rating")

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Cities in France"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
                                   "ranker": {"query": query, "top_k": 2}})

Related Links

See the parameters details in our API reference: