DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Jina integration for Haystack

Module haystack_integrations.components.embedders.jina.document_embedder

JinaDocumentEmbedder

A component for computing Document embeddings using Jina AI models. The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder

# Make sure that the environment variable JINA_API_KEY is set

document_embedder = JinaDocumentEmbedder()

doc = Document(content="I love pizza!")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

JinaDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v2-base-en",
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a JinaDocumentEmbedder component.

Arguments:

  • api_key: The Jina API key.
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.
  • meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

JinaDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Compute the embeddings for a list of Documents.

Arguments:

  • documents: A list of Documents to embed.

Raises:

  • TypeError: If the input is not a list of Documents.

Returns:

A dictionary with following keys:

  • documents: List of Documents, each with an embedding field containing the computed embedding.
  • meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.embedders.jina.text_embedder

JinaTextEmbedder

A component for embedding strings using Jina AI models.

Usage example:

from haystack_integrations.components.embedders.jina import JinaTextEmbedder

# Make sure that the environment variable JINA_API_KEY is set

text_embedder = JinaTextEmbedder()

text_to_embed = "I love pizza!"

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v2-base-en',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

JinaTextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v2-base-en",
             prefix: str = "",
             suffix: str = "")

Create a JinaTextEmbedder component.

Arguments:

  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.

JinaTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a string.

Arguments:

  • text: The string to embed.

Raises:

  • TypeError: If the input is not a string.

Returns:

A dictionary with following keys:

  • embedding: The embedding of the input string.
  • meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.rankers.jina.ranker

JinaRanker

Ranks Documents based on their similarity to the query using Jina AI models.

Usage example:

from haystack import Document
from haystack_integrations.components.rankers.jina import JinaRanker

ranker = JinaRanker()
docs = [Document(content="Paris"), Document(content="Berlin")]
query = "City in Germany"
result = ranker.run(query=query, documents=docs)
docs = result["documents"]
print(docs[0].content)

JinaRanker.__init__

def __init__(model: str = "jina-reranker-v1-base-en",
             api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             top_k: Optional[int] = None,
             score_threshold: Optional[float] = None)

Creates an instance of JinaRanker.

Arguments:

  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • model: The name of the Jina model to use. Check the list of available models on https://jina.ai/reranker/
  • top_k: The maximum number of Documents to return per query. If None, all documents are returned
  • score_threshold: If provided only returns documents with a score above this threshold.

Raises:

  • ValueError: If top_k is not > 0.

JinaRanker.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaRanker.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaRanker"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaRanker.run

@component.output_types(documents=List[Document])
def run(query: str,
        documents: List[Document],
        top_k: Optional[int] = None,
        score_threshold: Optional[float] = None)

Returns a list of Documents ranked by their similarity to the given query.

Arguments:

  • query: Query string.
  • documents: List of Documents.
  • top_k: The maximum number of Documents you want the Ranker to return.
  • score_threshold: If provided only returns documents with a score above this threshold.

Raises:

  • ValueError: If top_k is not > 0.

Returns:

A dictionary with the following keys:

  • documents: List of Documents most similar to the given query in descending order of similarity.