VLLMRanker
This component ranks documents based on their similarity to the query using reranker models served with vLLM.
| Most common position in a pipeline | In a query pipeline, after a component that returns a list of documents such as a Retriever |
| Mandatory init variables | model: The name of the reranker model served by vLLM |
| Mandatory run variables | query: A query string documents: A list of document objects |
| Output variables | documents: A list of document objects |
| API reference | vLLM |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm |
Overview
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an HTTP server, which VLLMRanker uses to rerank documents through the /rerank endpoint.
VLLMRanker expects a vLLM server to be running and accessible at the api_base_url parameter (by default, http://localhost:8000/v1). Use this component after a Retriever in a query pipeline to reorder the retrieved documents by relevance to the query.
You can also specify the top_k parameter to set the maximum number of documents to return, and the score_threshold parameter to drop documents with a relevance score below a given value.
If the vLLM server was started with --api-key, provide the API key through the VLLM_API_KEY environment variable or the api_key init parameter using Haystack's Secret API.
Compatible models
vLLM supports a range of reranker models. Check the vLLM supported models docs for the list of supported architectures and models.
vLLM-specific parameters
You can pass vLLM-specific parameters through the extra_parameters dictionary. These are merged into the request body sent to the /rerank endpoint. Use this to pass parameters that are not part of the standard rerank API, such as truncate_prompt_tokens. See the vLLM rerank API docs for details.
ranker = VLLMRanker(
model="BAAI/bge-reranker-base",
extra_parameters={"truncate_prompt_tokens": 256},
)
Embedding meta fields
Some use cases benefit from including meta information (such as a title) alongside the document content when reranking. Pass the names of the meta fields to include through the meta_fields_to_embed parameter; they will be concatenated with the document content using meta_data_separator.
ranker = VLLMRanker(
model="BAAI/bge-reranker-base",
meta_fields_to_embed=["title"],
meta_data_separator="\n",
)
Usage
Install the vllm-haystack package to use the VLLMRanker:
Starting the vLLM server
Before using this component, start a vLLM server with a reranker model:
For details on server options, see the vLLM CLI docs.
On its own
from haystack import Document
from haystack_integrations.components.rankers.vllm import VLLMRanker
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
docs = [
Document(content="The capital of Brazil is Brasilia."),
Document(content="The capital of France is Paris."),
]
result = ranker.run(query="What is the capital of France?", documents=docs)
print(result["documents"][0].content)
## The capital of France is Paris.
In a pipeline
from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.rankers.vllm import VLLMRanker
docs = [
Document(content="Paris is in France"),
Document(content="Berlin is in Germany"),
Document(content="Lyon is in France"),
]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store=document_store)
ranker = VLLMRanker(model="BAAI/bge-reranker-base")
document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
query = "Cities in France"
result = document_ranker_pipeline.run(
data={
"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2},
},
)
print(result["ranker"]["documents"][0])
## Document(id=..., content: 'Paris is in France', score: ...)