pyversity
haystack_integrations.components.rankers.pyversity.ranker
Haystack integration for pyversity <https://github.com/Pringled/pyversity>_.
Wraps pyversity's diversification algorithms as a Haystack @component,
making it easy to drop result diversification into any Haystack pipeline.
PyversityRanker
Reranks documents using pyversity's diversification algorithms.
Balances relevance and diversity in a ranked list of documents. Documents
must have both score and embedding populated (e.g. as returned by
a dense retriever with return_embedding=True).
Usage example:
from haystack import Document
from haystack_integrations.components.rankers.pyversity import PyversityRanker
from pyversity import Strategy
ranker = PyversityRanker(top_k=5, strategy=Strategy.MMR, diversity=0.5)
docs = [
Document(content="Paris", score=0.9, embedding=[0.1, 0.2]),
Document(content="Berlin", score=0.8, embedding=[0.3, 0.4]),
]
output = ranker.run(documents=docs)
docs = output["documents"]
init
__init__(
top_k: int | None = None,
*,
strategy: Strategy = Strategy.DPP,
diversity: float = 0.5
) -> None
Creates an instance of PyversityRanker.
Parameters:
- top_k (
int | None) – Number of documents to return after diversification. IfNone, all documents are returned in diversified order. - strategy (
Strategy) – Pyversity diversification strategy (e.g.Strategy.MMR). Defaults toStrategy.DPP. - diversity (
float) – Trade-off between relevance and diversity in [0, 1].0.0keeps only the most relevant documents;1.0maximises diversity regardless of relevance. Defaults to0.5.
Raises:
ValueError– Iftop_kis not a positive integer ordiversityis not in [0, 1].
to_dict
Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
PyversityRanker– The deserialized component instance.
run
run(
documents: list[Document],
top_k: int | None = None,
strategy: Strategy | None = None,
diversity: float | None = None,
) -> dict[str, list[Document]]
Rerank the list of documents using pyversity's diversification algorithm.
Documents missing score or embedding are skipped with a warning.
Parameters:
- documents (
list[Document]) – List of Documents to rerank. Each document must havescoreandembeddingset. - top_k (
int | None) – Overrides the initializedtop_kfor this call.Nonefalls back to the initialized value. - strategy (
Strategy | None) – Overrides the initializedstrategyfor this call.Nonefalls back to the initialized value. - diversity (
float | None) – Overrides the initializeddiversityfor this call.Nonefalls back to the initialized value.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List of up totop_kreranked Documents, ordered by the diversification algorithm.
Raises:
ValueError– Iftop_kis not a positive integer ordiversityis not in [0, 1].