DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

SentenceTransformersDiversityRanker

This is a Diversity Ranker based on Sentence Transformers.

Most common position in a pipelineIn a query pipeline, after a component that returns a list of documents such as a Retriever
Mandatory init variables"token": The Hugging Face API token. Can be set with HF_API_TOKEN or HF_TOKEN env var.
Mandatory run variables“documents”: A list of documents

”query”: A query string
Output variables“documents”: A list of documents
API referenceRankers
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/sentence_transformers_diversity.py

Overview

The SentenceTransformersDiversityRanker uses a ranking algorithm to order documents to maximize their overall diversity. It ranks a list of documents based on their similarity to the query. The component embeds the query and the documents using a pre-trained Sentence Transformers model.

This Ranker’s default model is sentence-transformers/all-MiniLM-L6-v2.

Parameters Overview

You can choose between two diversity ranking strategies:

  • Greedy Diversity Order (default): Ranks documents to maximize the overall diversity of the output documents based on their similarity to the query.
  • Maximum Margin Relevance : Ranks documents based on their Maximum Margin Relevance (MMR) scores. Use  lambda_threshold parameter to adjust the balance between relevance to the query and diversity towards already retrieved documents – values closer to 1 favor relevance, while values closer to 0 favor diversity (0.5 is the default).

You can also choose between two similarity metrics:

  • cosine (default): Measures cosine similarity between embeddings.
  • dot_product: Computes the dot product of embeddings.

Additionally, you can set the optional top_k parameter, which specifies the maximum number of documents to return. If you don’t set this parameter, the component returns all documents it receives.

Find the full list of optional initialization parameters in our API reference.

Usage

On its own

from haystack import Document
from haystack.components.rankers import SentenceTransformersDiversityRanker

ranker = SentenceTransformersDiversityRanker(model="sentence-transformers/all-MiniLM-L6-v2", similarity="cosine")
ranker.warm_up()

docs = [Document(content="Regular Exercise"), Document(content="Balanced Nutrition"), Document(content="Positive Mindset"), 
        Document(content="Eating Well"), Document(content="Doing physical activities"), Document(content="Thinking positively")]
        
query = "How can I maintain physical fitness?"
output = ranker.run(query=query, documents=docs)
docs = output["documents"]

print(docs)

In a pipeline

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import SentenceTransformersDiversityRanker

docs = [Document(content="The iconic Eiffel Tower is a symbol of Paris"),
        Document(content="Visit Luxembourg Gardens for a haven of tranquility in Paris"),
        Document(content="The Pont Alexandre III bridge in Paris is famous for its Beaux-Arts style")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = SentenceTransformersDiversityRanker(meta_field="rating")

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Most famous iconic sight in Paris"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
                                   "ranker": {"query": query, "top_k": 2}})

Related Links