JinaRanker
Use this component to rank documents based on their similarity to the query using Jina AI models.
Most common position in a pipeline | In a query pipeline, after a component that returns a list of documents (such as a Retriever ) |
Mandatory init variables | "api_key": The Jina API key. Can be set with JINA_API_KEY env var. |
Mandatory run variables | “query”: A query string ”documents”: A list of documents |
Output variables | “documents”: A list of documents |
API reference | Jina |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina |
Overview
JinaRanker
ranks the given documents based on how similar they are to the given query. It uses Jina AI ranking models – check out the full list at Jina AI’s website. The default model for this Ranker is jina-reranker-v1-base-en
.
Additionally, you can use the optional top_k
and score_threshold
parameters with JinaRanker
:
- The Ranker's
top_k
is the number of documents it returns (if it's the last component in the pipeline) or forwards to the next component. - If you set the
score_threshold
for the Ranker, it will only return documents with a similarity score (computed by the Jina AI model) above this threshold.
Installation
To start using this integration with Haystack, install the package with:
pip install jina-haystack
Authorization
The component uses a JINA_API_KEY
environment variable by default. Otherwise, you can pass a Jina API key at initialization with api_key
like this:
ranker = JinaRanker(api_key=Secret.from_token("<your-api-key>"))
To get your API key, head to Jina AI’s website.
Usage
On its own
You can use JinaRanker
outside of a pipeline to order documents based on your query.
To run the Ranker, pass a query, provide the documents, and set the number of documents to return in the top_k
parameter.
from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker
docs = [Document(content="Paris"), Document(content="Berlin")]
ranker = JinaRanker()
ranker.run(query="City in France", documents=docs, top_k=1)
In a pipeline
This is an example of a pipeline that retrieves documents from an InMemoryDocumentStore
based on keyword search (using InMemoryBM25Retriever
). It then uses the JinaRanker
to rank the retrieved documents according to their similarity to the query.
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack_integrations.components.rankers.jina import JinaRanker
docs = [Document(content="Paris is in France"),
Document(content="Berlin is in Germany"),
Document(content="Lyon is in France")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)
retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = JinaRanker()
ranker_pipeline = Pipeline()
ranker_pipeline.add_component(instance=retriever, name="retriever")
ranker_pipeline.add_component(instance=ranker, name="ranker")
ranker_pipeline.connect("retriever.documents", "ranker.documents")
query = "Cities in France"
ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2}})
Updated 5 months ago