Choosing the Right Ranker
This page provides guidance on selecting the right Ranker for your pipeline in Haystack. It explains the distinctions between API-based, on-premise rankers and heuristic approaches, and offers advice based on latency, privacy, and diversity requirements.
Rankers in Haystack reorder a set of retrieved documents based on their estimated relevance to a user query. Rankers operate after retrieval and aim to refine the result list before it's passed to a downstream component like a Generator or Reader.
This reordering is based on additional signals beyond simple vector similarity. Depending on the Ranker used, these signals can include semantic similarity (with cross-encoders), structured metadata (such as timestamps or categories), or position-based heuristics (for example, placing relevant content at the start and end).
A typical question answering pipeline using a Ranker includes:
- Retrieve: Use a Retriever to find a candidate set of documents.
- Rank: Reorder those documents using a Ranker component.
- Answer: Pass the re-ranked documents to a downstream Generator or Reader.
This guide helps you choose the right Ranker depending on your use case, whether you're optimizing for performance, cost, accuracy, or diversity in results. It focuses on selecting between different types of Rankers in Haystack, not specific models, but rather the general mechanism and interface that best suits your setup.
API Based Rankers
These Rankers use external APIs to reorder documents using powerful models hosted remotely. They offer high-quality relevance scoring without local compute, but can be slower due to network latency and costly at scale.
The pricing model varies by provider, some charge per token processed , while others bill by usage time or number of API calls. Refer to the respective provider documentation for precise cost structures.
Most API-based Rankers in Haystack currently rely on cross-encoder models (currently, but might change in the future), which evaluate the query and document together to produce highly accurate relevance scores. Examples include AmazonBedrockRanker, CohereRanker and JinaRanker.
In contrast, the NvidiaRanker uses large language models (LLMs) for ranking. These models treat relevance as a semantic reasoning task, which can yield better results for complex or multi-step queries, though often at higher computational cost.
On-Premise Rankers
These Rankers run entirely on your local infrastructure. They are ideal for teams prioritizing data privacy, cost control, or low-latency inference without depending on external APIs. Since the models are executed locally, they avoid network bottlenecks and recurring usage costs, but require sufficient compute resources, typically GPU-backed, especially for cross-encoder models.
All on-premise Rankers in Haystack use cross-encoder architectures. These models jointly process the query and each document to assess relevance with deep contextual awareness. For example:
- SentenceTransformersSimilarityRanker ranks documents based on semantic similarity to the query. In addition to the default PyTorch backend (optimal for GPU), it also offers other memory-efficient options which are suitable for CPU-only cases: ONNX and OpenVINO.
- TransformersSimilarityRanker is its legacy predecessor and should generally be avoided in favor of the newer, more flexible SentenceTransformersSimilarityRanker.
- HuggingFaceTEIRanker is based on the Text Embeddings Inference project: whether you have GPU resources or not, it offers high-performance for serving the models locally. In addition, you can also use this component to perform inference with reranking models hosted on Hugging Face Inference Endpoints.
- FastembedRanker supports a variety of cross-encoder models and is optimal for CPU-only environments.
- SentenceTransformersDiversityRanker reorders documents to maximize diversity, helping reduce redundancy and cover a broader range of relevant topics.
These Rankers give you full control over model selection, optimization, and deployment, making them well-suited for production environments with strict SLAs or compliance requirements.
Rule-Based Rankers
Rule-Based Rankers in Haystack prioritize or reorder documents based on heuristic logic rather than semantic understanding. They operate on document metadata or simple structural patterns, making them computationally efficient and useful for enforcing domain-specific rules or structuring inputs in a retrieval pipeline. While they do not assess semantic relevance directly, they serve as valuable complements to more advanced methods like cross-encoder or LLM-based Rankers.
For example:
- MetaFieldRanker scores and orders documents based on metadata values such as recency, source reliability, or custom-defined priorities.
- MetaFieldGroupingRanker groups documents by a specified metadata field and returns every document in each group together, ensuring that related documents (for example, from the same file) are processed as a single block, which has been shown to improve LLM performance.
- LostInTheMiddleRanker reorders documents after ranking to mitigate position bias in models with limited context windows, ensuring that highly relevant items are not overlooked.
The MetaFieldRanker Ranker is typically used before semantic ranking to filter or restructure documents according to business logic.
In contrast, LostInTheMiddleRanker and MetaFieldGroupingRanker are intended for use after ranking, to improve the effectiveness of downstream components like LLMs. These deterministic approaches provide speed, transparency, and fine-grained control, making them well-suited for pipelines requiring explainability or strict operational logic.
Updated about 8 hours ago