Module haystack_integrations.components.embedders.jina.document_embedder

JinaDocumentEmbedder

A component for computing Document embeddings using Jina AI models. The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder

# Make sure that the environment variable JINA_API_KEY is set

document_embedder = JinaDocumentEmbedder(task="retrieval.query")

doc = Document(content="I love pizza!")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

JinaDocumentEmbedder.init

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v3",
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             task: Optional[str] = None,
             dimensions: Optional[int] = None,
             late_chunking: Optional[bool] = None)

Create a JinaDocumentEmbedder component.

Arguments:

api_key: The Jina API key.
model: The name of the Jina model to use. Check the list of available models on Jina documentation.
prefix: A string to add to the beginning of each text.
suffix: A string to add to the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.
meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.
task: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.
dimensions: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.
late_chunking: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.

The support of task and late_chunking parameters is only available for jina-embeddings-v3.

JinaDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Compute the embeddings for a list of Documents.

Arguments:

documents: A list of Documents to embed.

Raises:

TypeError: If the input is not a list of Documents.

Returns:

A dictionary with following keys:

documents: List of Documents, each with an embedding field containing the computed embedding.
meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.embedders.jina.text_embedder

JinaTextEmbedder

A component for embedding strings using Jina AI models.

Usage example:

from haystack_integrations.components.embedders.jina import JinaTextEmbedder

# Make sure that the environment variable JINA_API_KEY is set

text_embedder = JinaTextEmbedder(task="retrieval.query")

text_to_embed = "I love pizza!"

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v3',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

JinaTextEmbedder.init

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v3",
             prefix: str = "",
             suffix: str = "",
             task: Optional[str] = None,
             dimensions: Optional[int] = None,
             late_chunking: Optional[bool] = None)

Create a JinaTextEmbedder component.

Arguments:

api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
model: The name of the Jina model to use. Check the list of available models on Jina documentation.
prefix: A string to add to the beginning of each text.
suffix: A string to add to the end of each text.
task: The downstream task for which the embeddings will be used. The model will return the optimized embeddings for that task. Check the list of available tasks on Jina documentation.
dimensions: Number of desired dimension. Smaller dimensions are easier to store and retrieve, with minimal performance impact thanks to MRL.
late_chunking: A boolean to enable or disable late chunking. Apply the late chunking technique to leverage the model's long-context capabilities for generating contextual chunk embeddings.

The support of task and late_chunking parameters is only available for jina-embeddings-v3.

JinaTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a string.

Arguments:

text: The string to embed.

Raises:

TypeError: If the input is not a string.

Returns:

A dictionary with following keys:

embedding: The embedding of the input string.
meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.rankers.jina.ranker

JinaRanker

Ranks Documents based on their similarity to the query using Jina AI models.

Usage example:

from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker

ranker = JinaRanker()
docs = [Document(content="Paris"), Document(content="Berlin")]
query = "City in Germany"
result = ranker.run(query=query, documents=docs)
docs = result["documents"]
print(docs[0].content)

JinaRanker.init

def __init__(model: str = "jina-reranker-v1-base-en",
             api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             top_k: Optional[int] = None,
             score_threshold: Optional[float] = None)

Creates an instance of JinaRanker.

Arguments:

api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
model: The name of the Jina model to use. Check the list of available models on https://jina.ai/reranker/
top_k: The maximum number of Documents to return per query. If None, all documents are returned
score_threshold: If provided only returns documents with a score above this threshold.

Raises:

ValueError: If top_k is not > 0.

JinaRanker.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaRanker.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaRanker"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaRanker.run

@component.output_types(documents=List[Document])
def run(query: str,
        documents: List[Document],
        top_k: Optional[int] = None,
        score_threshold: Optional[float] = None)

Returns a list of Documents ranked by their similarity to the given query.

Arguments:

query: Query string.
documents: List of Documents.
top_k: The maximum number of Documents you want the Ranker to return.
score_threshold: If provided only returns documents with a score above this threshold.

Raises:

ValueError: If top_k is not > 0.

Returns:

A dictionary with the following keys:

documents: List of Documents most similar to the given query in descending order of similarity.

Module haystack_integrations.components.connectors.jina.reader

JinaReaderConnector

A component that interacts with Jina AI's reader service to process queries and return documents.

This component supports different modes of operation: read, search, and ground.

Usage example:

from haystack_integrations.components.connectors.jina import JinaReaderConnector

reader = JinaReaderConnector(mode="read")
query = "https://example.com"
result = reader.run(query=query)
document = result["documents"][0]
print(document.content)

>>> "This domain is for use in illustrative examples..."

JinaReaderConnector.init

def __init__(mode: Union[JinaReaderMode, str],
             api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             json_response: bool = True)

Initialize a JinaReader instance.

Arguments:

mode: The operation mode for the reader (read, search or ground).
read: process a URL and return the textual content of the page.
search: search the web and return textual content of the most relevant pages.
ground: call the grounding engine to perform fact checking. For more information on the modes, see the Jina Reader documentation.
api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
json_response: Controls the response format from the Jina Reader API. If True, requests a JSON response, resulting in Documents with rich structured metadata. If False, requests a raw response, resulting in one Document with minimal metadata.

JinaReaderConnector.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaReaderConnector.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaReaderConnector"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaReaderConnector.run

@component.output_types(documents=List[Document])
def run(query: str, headers: Optional[Dict[str, str]] = None)

Process the query/URL using the Jina AI reader service.

Arguments:

query: The query string or URL to process.
headers: Optional headers to include in the request for customization. Refer to the Jina Reader documentation for more information.

Returns:

A dictionary with the following keys:

documents: A list of Document objects.

Module haystack_integrations.components.embedders.jina.document_embedder

JinaDocumentEmbedder

JinaDocumentEmbedder.__init__

JinaDocumentEmbedder.to_dict

JinaDocumentEmbedder.from_dict

JinaDocumentEmbedder.run

Module haystack_integrations.components.embedders.jina.text_embedder

JinaTextEmbedder

JinaTextEmbedder.__init__

JinaTextEmbedder.to_dict

JinaTextEmbedder.from_dict

JinaTextEmbedder.run

Module haystack_integrations.components.rankers.jina.ranker

JinaRanker

JinaRanker.__init__

JinaRanker.to_dict

JinaRanker.from_dict

JinaRanker.run

Module haystack_integrations.components.connectors.jina.reader

JinaReaderConnector

JinaReaderConnector.__init__

JinaReaderConnector.to_dict

JinaReaderConnector.from_dict

JinaReaderConnector.run

JinaDocumentEmbedder.init

JinaTextEmbedder.init

JinaRanker.init

JinaReaderConnector.init