SentenceTransformersDiversityRanker
This is a Diversity Ranker based on Sentence Transformers.
Most common position in a pipeline | In a query pipeline, after a component that returns a list of documents such as a Retriever |
Mandatory init variables | "token": The Hugging Face API token. Can be set with HF_API_TOKEN or HF_TOKEN env var. |
Mandatory run variables | “documents”: A list of documents ”query”: A query string |
Output variables | “documents”: A list of documents |
API reference | Rankers |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/rankers/sentence_transformers_diversity.py |
Overview
The SentenceTransformersDiversityRanker
uses a ranking algorithm to order documents to maximize their overall diversity. It ranks a list of documents based on their similarity to the query. The component embeds the query and the documents using a pre-trained Sentence Transformers model.
This Ranker’s default model is sentence-transformers/all-MiniLM-L6-v2
.
You can optionally set the top_k
parameter, which specifies the maximum number of documents to return. If you don’t set this parameter, the component returns all documents it receives.
Find the full list of optional initialization parameters in our API reference.
Usage
On its own
from haystack import Document
from haystack.components.rankers import SentenceTransformersDiversityRanker
ranker = SentenceTransformersDiversityRanker(model="sentence-transformers/all-MiniLM-L6-v2", similarity="cosine")
ranker.warm_up()
docs = [Document(content="Regular Exercise"), Document(content="Balanced Nutrition"), Document(content="Positive Mindset"),
Document(content="Eating Well"), Document(content="Doing physical activities"), Document(content="Thinking positively")]
query = "How can I maintain physical fitness?"
output = ranker.run(query=query, documents=docs)
docs = output["documents"]
print(docs)
In a pipeline
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import SentenceTransformersDiversityRanker
docs = [Document(content="The iconic Eiffel Tower is a symbol of Paris"),
Document(content="Visit Luxembourg Gardens for a haven of tranquility in Paris"),
Document(content="The Pont Alexandre III bridge in Paris is famous for its Beaux-Arts style")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = SentenceTransformersDiversityRanker(meta_field="rating")
document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
query = "Most famous iconic sight in Paris"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2}})
Updated 3 months ago
Related Links