Skip to main content
Version: 2.28

VLLMRanker

This component ranks documents based on their similarity to the query using reranker models served with vLLM.

Most common position in a pipelineIn a query pipeline, after a component that returns a list of documents such as a Retriever
Mandatory init variablesmodel: The name of the reranker model served by vLLM
Mandatory run variablesquery: A query string

documents: A list of document objects
Output variablesdocuments: A list of document objects
API referencevLLM
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm

Overview

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which VLLMRanker uses to rerank documents through the /rerank endpoint.

VLLMRanker expects a vLLM server to be running and accessible at the api_base_url parameter (by default, http://localhost:8000/v1). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.

You can also specify the top_k parameter to set the maximum number of documents to return, and the score_threshold parameter to drop documents with a relevance score below a given value.

If the vLLM server was started with --api-key, provide the API key through the VLLM_API_KEY environment variable or the api_key init parameter using Haystack's Secret API.

Compatible models

vLLM supports a range of reranker models. Check the vLLM supported models docs for the list of supported architectures and models.

vLLM-specific parameters

You can pass vLLM-specific parameters through the extra_parameters dictionary. These are merged into the request body sent to the /rerank endpoint. Use this to pass parameters that are not part of the standard rerank API, such as truncate_prompt_tokens. See the vLLM rerank API docs for details.

python
ranker = VLLMRanker(
model="BAAI/bge-reranker-base",
extra_parameters={"truncate_prompt_tokens": 256},
)

Embedding meta fields

Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the meta_fields_to_embed parameter; they will be concatenated with the document content using meta_data_separator.

python
ranker = VLLMRanker(
model="BAAI/bge-reranker-base",
meta_fields_to_embed=["title"],
meta_data_separator="\n",
)

Usage

Install the vllm-haystack package to use the VLLMRanker:

shell
pip install vllm-haystack

Starting the vLLM server

Before using this component, start a vLLM server with a reranker model:

bash
vllm serve BAAI/bge-reranker-base

For details on server options, see the vLLM CLI docs.

On its own

python
from haystack import Document
from haystack_integrations.components.rankers.vllm import VLLMRanker

ranker = VLLMRanker(model="BAAI/bge-reranker-base")

docs = [
Document(content="The capital of Brazil is Brasilia."),
Document(content="The capital of France is Paris."),
]
result = ranker.run(query="What is the capital of France?", documents=docs)
print(result["documents"][0].content)

## The capital of France is Paris.

In a pipeline

python
from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.rankers.vllm import VLLMRanker

docs = [
Document(content="Paris is in France"),
Document(content="Berlin is in Germany"),
Document(content="Lyon is in France"),
]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store=document_store)
ranker = VLLMRanker(model="BAAI/bge-reranker-base")

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Cities in France"
result = document_ranker_pipeline.run(
data={
"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2},
},
)

print(result["ranker"]["documents"][0])

## Document(id=..., content: 'Paris is in France', score: ...)