DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Jina integration for Haystack

Module haystack_integrations.components.embedders.jina.document_embedder

JinaDocumentEmbedder

@component
class JinaDocumentEmbedder()

A component for computing Document embeddings using Jina AI models. The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder

# Make sure that the environment variable JINA_API_KEY is set

document_embedder = JinaDocumentEmbedder()

doc = Document(content="I love pizza!")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

JinaDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v2-base-en",
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a JinaDocumentEmbedder component.

Arguments:

  • api_key: The Jina API key.
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.
  • meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

JinaDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Compute the embeddings for a list of Documents.

Arguments:

  • documents: A list of Documents to embed.

Raises:

  • TypeError: If the input is not a list of Documents.

Returns:

A dictionary with following keys:

  • documents: List of Documents, each with an embedding field containing the computed embedding.
  • meta: A dictionary with metadata including the model name and usage statistics.

Module haystack_integrations.components.embedders.jina.text_embedder

JinaTextEmbedder

@component
class JinaTextEmbedder()

A component for embedding strings using Jina AI models.

Usage example:

from haystack_integrations.components.embedders.jina import JinaTextEmbedder

# Make sure that the environment variable JINA_API_KEY is set

text_embedder = JinaTextEmbedder()

text_to_embed = "I love pizza!"

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v2-base-en',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

JinaTextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
             model: str = "jina-embeddings-v2-base-en",
             prefix: str = "",
             suffix: str = "")

Create a JinaTextEmbedder component.

Arguments:

  • api_key: The Jina API key. It can be explicitly provided or automatically read from the environment variable JINA_API_KEY (recommended).
  • model: The name of the Jina model to use. Check the list of available models on Jina documentation.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.

JinaTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

JinaTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

JinaTextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a string.

Arguments:

  • text: The string to embed.

Raises:

  • TypeError: If the input is not a string.

Returns:

A dictionary with following keys:

  • embedding: The embedding of the input string.
  • meta: A dictionary with metadata including the model name and usage statistics.