MetaFieldRanker
MetaFieldRanker
ranks Documents based on the value of their meta field you specify. It's a lightweight Ranker that can improve your pipeline's results without slowing it down.
Most common position in a pipeline | In a query pipeline, after a component that returns a list of documents, such as a Retriever |
Mandatory init variables | "meta_field": The name of the meta field to rank by |
Mandatory run variables | “documents”: A list of documents ”top_k”: The maximum number of documents to return. If not provided, returns all documents it received. |
Output variables | “documents”: A list of documents |
API reference | Rankers |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/meta_field.py |
Overview
MetaFieldRanker
sorts documents based on the value of a specific meta field in descending or ascending order. This means the returned list of Document
objects are arranged in a selected order, with string values sorted alphabetically or in reverse (for example, Tokyo, Paris, Berlin).
MetaFieldRanker
comes with the optional parameters weight
and ranking_mode
you can use to combine a document’s score assigned by the Retriever and the value of its meta field for the ranking. The weight
parameter lets you balance the importance of the Document's content and the meta field in the ranking process. The ranking_mode
parameter defines how the scores from the Retriever and the Ranker are combined.
This Ranker is useful in query pipelines, like retrieval-augmented generation (RAG) pipelines or document search pipelines. It ensures the documents are ordered by their meta field value. You can also use it after a Retriever (such as the InMemoryEmbeddingRetriever
) to combine the Retriever’s score with a document’s meta value for improved ranking.
By default, MetaFieldRanker
sorts documents only based on the meta field. You can adjust this by setting the weight
to less than 1 when initializing this component. For more details on different initialization settings, check out the API reference for this component.
Usage
On its own
You can use this Ranker outside of a pipeline to sort documents.
This example uses the MetaFieldRanker
to rank two simple documents. When running the Ranker, you pass the query
, provide the documents
and set the number of documents to rank using the top_k
parameter.
from haystack import Document
from haystack.components.rankers import MetaFieldRanker
docs = [Document(content="Paris", meta={"rating": 1.3}), Document(content="Berlin", meta={"rating": 0.7})]
ranker = MetaFieldRanker(meta_field="rating")
ranker.run(query="City in France", documents=docs, top_k=1)
In a pipeline
Below is an example of a pipeline that retrieves documents from an InMemoryDocumentStore
based on keyword search (using InMemoryBM25Retriever
). It then uses the MetaFieldRanker
to rank the retrieved documents based on the meta field rating
, using the Ranker's default settings:
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import MetaFieldRanker
docs = [Document(content="Paris", meta={"rating": 1.3}),
Document(content="Berlin", meta={"rating": 0.7}),
Document(content="Barcelona", meta={"rating": 2.1})]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = MetaFieldRanker(meta_field="rating")
document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
query = "Cities in France"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2}})
Updated 3 months ago