DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Jina integration for Haystack

Module haystack_integrations.components.embedders.jina.document_embedder

JinaDocumentEmbedder

A component for computing Document embeddings using Jina AI models. The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder

# Make sure that the environment variable JINA_API_KEY is set

document_embedder = JinaDocumentEmbedder(task="retrieval.query")

doc = Document(content="I love pizza!")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

JinaDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v3",
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             task: Optional[str] = None,
             dimensions: Optional[int] = None,
             late_chunking: Optional[bool] = None)

Create a JinaDocumentEmbedder component.

Arguments:

  • api_key: The Jina API key.
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.
  • meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.
  • task: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.
  • dimensions: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.
  • late_chunking: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.

The support of task and late_chunking parameters is only available for jina-embeddings-v3.

JinaDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Compute the embeddings for a list of Documents.

Arguments:

  • documents: A list of Documents to embed.

Raises:

  • TypeError: If the input is not a list of Documents.

Returns:

A dictionary with following keys:

  • documents: List of Documents, each with an embedding field containing the computed embedding.
  • meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.embedders.jina.text_embedder

JinaTextEmbedder

A component for embedding strings using Jina AI models.

Usage example:

from haystack_integrations.components.embedders.jina import JinaTextEmbedder

# Make sure that the environment variable JINA_API_KEY is set

text_embedder = JinaTextEmbedder(task="retrieval.query")

text_to_embed = "I love pizza!"

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v3',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

JinaTextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v3",
             prefix: str = "",
             suffix: str = "",
             task: Optional[str] = None,
             dimensions: Optional[int] = None,
             late_chunking: Optional[bool] = None)

Create a JinaTextEmbedder component.

Arguments:

  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • task: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.
  • dimensions: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.
  • late_chunking: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.

The support of task and late_chunking parameters is only available for jina-embeddings-v3.

JinaTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a string.

Arguments:

  • text: The string to embed.

Raises:

  • TypeError: If the input is not a string.

Returns:

A dictionary with following keys:

  • embedding: The embedding of the input string.
  • meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.rankers.jina.ranker

JinaRanker

Ranks Documents based on their similarity to the query using Jina AI models.

Usage example:

from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker

ranker = JinaRanker()
docs = [Document(content="Paris"), Document(content="Berlin")]
query = "City in Germany"
result = ranker.run(query=query, documents=docs)
docs = result["documents"]
print(docs[0].content)

JinaRanker.__init__

def __init__(model: str = "jina-reranker-v1-base-en",
             api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             top_k: Optional[int] = None,
             score_threshold: Optional[float] = None)

Creates an instance of JinaRanker.

Arguments:

  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • model: The name of the Jina model to use. Check the list of available models on https://jina.ai/reranker/
  • top_k: The maximum number of Documents to return per query. If None, all documents are returned
  • score_threshold: If provided only returns documents with a score above this threshold.

Raises:

  • ValueError: If top_k is not > 0.

JinaRanker.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaRanker.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaRanker"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaRanker.run

@component.output_types(documents=List[Document])
def run(query: str,
        documents: List[Document],
        top_k: Optional[int] = None,
        score_threshold: Optional[float] = None)

Returns a list of Documents ranked by their similarity to the given query.

Arguments:

  • query: Query string.
  • documents: List of Documents.
  • top_k: The maximum number of Documents you want the Ranker to return.
  • score_threshold: If provided only returns documents with a score above this threshold.

Raises:

  • ValueError: If top_k is not > 0.

Returns:

A dictionary with the following keys:

  • documents: List of Documents most similar to the given query in descending order of similarity.

Module haystack_integrations.components.connectors.jina.reader

JinaReaderConnector

A component that interacts with Jina AI's reader service to process queries and return documents.

This component supports different modes of operation: read, search, and ground.

Usage example:

from haystack_integrations.components.connectors.jina import JinaReaderConnector

reader = JinaReaderConnector(mode="read")
query = "https://example.com"
result = reader.run(query=query)
document = result["documents"][0]
print(document.content)

>>> "This domain is for use in illustrative examples..."

JinaReaderConnector.__init__

def __init__(mode: Union[JinaReaderMode, str],
             api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             json_response: bool = True)

Initialize a JinaReader instance.

Arguments:

  • mode: The operation mode for the reader (read, search or ground).
  • read: process a URL and return the textual content of the page.
  • search: search the web and return textual content of the most relevant pages.
  • ground: call the grounding engine to perform fact checking. For more information on the modes, see the Jina Reader documentation.
  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • json_response: Controls the response format from the Jina Reader API. If True, requests a JSON response, resulting in Documents with rich structured metadata. If False, requests a raw response, resulting in one Document with minimal metadata.

JinaReaderConnector.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaReaderConnector.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaReaderConnector"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaReaderConnector.run

@component.output_types(documents=List[Document])
def run(query: str, headers: Optional[Dict[str, str]] = None)

Process the query/URL using the Jina AI reader service.

Arguments:

  • query: The query string or URL to process.
  • headers: Optional headers to include in the request for customization. Refer to the Jina Reader documentation for more information.

Returns:

A dictionary with the following keys:

  • documents: A list of Document objects.