Jina integration for Haystack
Module haystack_integrations.components.embedders.jina.document_embedder
JinaDocumentEmbedder
A component for computing Document embeddings using Jina AI models.
The embedding of each Document is stored in the embedding
field of the Document.
Usage example:
from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder
# Make sure that the environment variable JINA_API_KEY is set
document_embedder = JinaDocumentEmbedder(task="retrieval.query")
doc = Document(content="I love pizza!")
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
JinaDocumentEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
model: str = "jina-embeddings-v3",
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n",
task: Optional[str] = None,
dimensions: Optional[int] = None,
late_chunking: Optional[bool] = None)
Create a JinaDocumentEmbedder component.
Arguments:
api_key
: The Jina API key.model
: The name of the Jina model to use. Check the list of available models on Jina documentation.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.meta_fields_to_embed
: List of meta fields that should be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.task
: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.dimensions
: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.late_chunking
: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.
The support of task
and late_chunking
parameters is only available for jina-embeddings-v3.
JinaDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])
Compute the embeddings for a list of Documents.
Arguments:
documents
: A list of Documents to embed.
Raises:
TypeError
: If the input is not a list of Documents.
Returns:
A dictionary with following keys:
documents
: List of Documents, each with anembedding
field containing the computed embedding.meta
: A dictionary with metadata including the model name and usage statistics.
Module haystack_integrations.components.embedders.jina.text_embedder
JinaTextEmbedder
A component for embedding strings using Jina AI models.
Usage example:
from haystack_integrations.components.embedders.jina import JinaTextEmbedder
# Make sure that the environment variable JINA_API_KEY is set
text_embedder = JinaTextEmbedder(task="retrieval.query")
text_to_embed = "I love pizza!"
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v3',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
JinaTextEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
model: str = "jina-embeddings-v3",
prefix: str = "",
suffix: str = "",
task: Optional[str] = None,
dimensions: Optional[int] = None,
late_chunking: Optional[bool] = None)
Create a JinaTextEmbedder component.
Arguments:
api_key
: The Jina API key. It can be explicitly provided or automatically read from the environment variableJINA_API_KEY
(recommended).model
: The name of the Jina model to use. Check the list of available models on Jina documentation.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.task
: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.dimensions
: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.late_chunking
: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.
The support of task
and late_chunking
parameters is only available for jina-embeddings-v3.
JinaTextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaTextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a string.
Arguments:
text
: The string to embed.
Raises:
TypeError
: If the input is not a string.
Returns:
A dictionary with following keys:
embedding
: The embedding of the input string.meta
: A dictionary with metadata including the model name and usage statistics.
Module haystack_integrations.components.rankers.jina.ranker
JinaRanker
Ranks Documents based on their similarity to the query using Jina AI models.
Usage example:
from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker
ranker = JinaRanker()
docs = [Document(content="Paris"), Document(content="Berlin")]
query = "City in Germany"
result = ranker.run(query=query, documents=docs)
docs = result["documents"]
print(docs[0].content)
JinaRanker.__init__
def __init__(model: str = "jina-reranker-v1-base-en",
api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
top_k: Optional[int] = None,
score_threshold: Optional[float] = None)
Creates an instance of JinaRanker.
Arguments:
api_key
: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).model
: The name of the Jina model to use. Check the list of available models onhttps://jina.ai/reranker/
top_k
: The maximum number of Documents to return per query. IfNone
, all documents are returnedscore_threshold
: If provided only returns documents with a score above this threshold.
Raises:
ValueError
: Iftop_k
is not > 0.
JinaRanker.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaRanker.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaRanker"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaRanker.run
@component.output_types(documents=List[Document])
def run(query: str,
documents: List[Document],
top_k: Optional[int] = None,
score_threshold: Optional[float] = None)
Returns a list of Documents ranked by their similarity to the given query.
Arguments:
query
: Query string.documents
: List of Documents.top_k
: The maximum number of Documents you want the Ranker to return.score_threshold
: If provided only returns documents with a score above this threshold.
Raises:
ValueError
: Iftop_k
is not > 0.
Returns:
A dictionary with the following keys:
documents
: List of Documents most similar to the given query in descending order of similarity.
Module haystack_integrations.components.connectors.jina.reader
JinaReaderConnector
A component that interacts with Jina AI's reader service to process queries and return documents.
This component supports different modes of operation: read
, search
, and ground
.
Usage example:
from haystack_integrations.components.connectors.jina import JinaReaderConnector
reader = JinaReaderConnector(mode="read")
query = "https://example.com"
result = reader.run(query=query)
document = result["documents"][0]
print(document.content)
>>> "This domain is for use in illustrative examples..."
JinaReaderConnector.__init__
def __init__(mode: Union[JinaReaderMode, str],
api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
json_response: bool = True)
Initialize a JinaReader instance.
Arguments:
mode
: The operation mode for the reader (read
,search
orground
).read
: process a URL and return the textual content of the page.search
: search the web and return textual content of the most relevant pages.ground
: call the grounding engine to perform fact checking. For more information on the modes, see the Jina Reader documentation.api_key
: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).json_response
: Controls the response format from the Jina Reader API. IfTrue
, requests a JSON response, resulting in Documents with rich structured metadata. IfFalse
, requests a raw response, resulting in one Document with minimal metadata.
JinaReaderConnector.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaReaderConnector.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaReaderConnector"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaReaderConnector.run
@component.output_types(documents=List[Document])
def run(query: str, headers: Optional[Dict[str, str]] = None)
Process the query/URL using the Jina AI reader service.
Arguments:
query
: The query string or URL to process.headers
: Optional headers to include in the request for customization. Refer to the Jina Reader documentation for more information.
Returns:
A dictionary with the following keys:
documents
: A list ofDocument
objects.