JinaRanker
Use this component to rank Documents based on their similarity to the query using Jina AI models.
Name | JinaRanker |
Path | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina |
Most common Position in a Pipeline | In a query pipeline, after a component that returns a list of documents (such as a Retriever). |
Mandatory Input variables | βqueryβ: A query string βdocumentsβ: A list of Document objects |
Output variables | βdocumentsβ: A list of Document objects |
Overview
JinaRanker
ranks the given documents based on how similar they are to the given query. It uses Jina AI ranking models β check out the full list at Jina AIβs website. The default model for this Ranker is jina-reranker-v1-base-en
.
Additionally, you can use the optional top_k
and score_threshold
parameters with JinaRanker
:
- The Ranker'sΒ
top_k
Β is the number of documents it returns (if it's the last component in the pipeline) or forwards to the next component. - If you set the
score_threshold
for the Ranker, it will only return documents with a similarity score (computed by the Jina AI model) above this threshold.
Installation
To start using this integration with Haystack, install the package with:
pip install jina-haystack
Authorization
The component uses aΒ JINA_API_KEY
Β environment variable by default. Otherwise, you can pass a Jina API key at initialization withΒ api_key
Β like this:
ranker = JinaRanker(api_key=Secret.from_token("<your-api-key>"))
To get your API key, head to Jina AIβs website.
Usage
On its own
You can use JinaRanker
Β outside of a pipeline to order documents based on your query.
To run the Ranker, pass a query, provide the documents, and set the number of documents to return in theΒ top_k
Β parameter.
from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker
docs = [Document(content="Paris"), Document(content="Berlin")]
ranker = JinaRanker()
ranker.run(query="City in France", documents=docs, top_k=1)
In a Pipeline
This is an example of a pipeline that retrieves documents from an InMemoryDocumentStore
based on keyword search (using InMemoryBM25Retriever
). It then uses the JinaRanker
to rank the retrieved documents according to their similarity to the query.
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack_integrations.components.rankers.jina import JinaRanker
docs = [Document(content="Paris is in France"),
Document(content="Berlin is in Germany"),
Document(content="Lyon is in France")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = JinaRanker()
ranker_pipeline = Pipeline()
ranker_pipeline.add_component(instance=retriever, name="retriever")
ranker_pipeline.add_component(instance=ranker, name="ranker")
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
query = "Cities in France"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2}})
Updated 16 days ago