Jina integration for Haystack
Module haystack_integrations.components.embedders.jina.document_embedder
JinaDocumentEmbedder
@component
class JinaDocumentEmbedder()
A component for computing Document embeddings using Jina AI models.
The embedding of each Document is stored in the embedding
field of the Document.
Usage example:
from haystack import Document
from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder
# Make sure that the environment variable JINA_API_KEY is set
document_embedder = JinaDocumentEmbedder()
doc = Document(content="I love pizza!")
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
JinaDocumentEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
model: str = "jina-embeddings-v2-base-en",
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a JinaDocumentEmbedder component.
Arguments:
api_key
: The Jina API key.model
: The name of the Jina model to use. Check the list of available models on Jina documentation.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: Whether to show a progress bar or not. Can be helpful to disable in production deployments to keep the logs clean.meta_fields_to_embed
: List of meta fields that should be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
JinaDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])
Compute the embeddings for a list of Documents.
Arguments:
documents
: A list of Documents to embed.
Raises:
TypeError
: If the input is not a list of Documents.
Returns:
A dictionary with following keys:
documents
: List of Documents, each with anembedding
field containing the computed embedding.meta
: A dictionary with metadata including the model name and usage statistics.
Module haystack_integrations.components.embedders.jina.text_embedder
JinaTextEmbedder
@component
class JinaTextEmbedder()
A component for embedding strings using Jina AI models.
Usage example:
from haystack_integrations.components.embedders.jina import JinaTextEmbedder
# Make sure that the environment variable JINA_API_KEY is set
text_embedder = JinaTextEmbedder()
text_to_embed = "I love pizza!"
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'jina-embeddings-v2-base-en',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
JinaTextEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
model: str = "jina-embeddings-v2-base-en",
prefix: str = "",
suffix: str = "")
Create a JinaTextEmbedder component.
Arguments:
api_key
: The Jina API key. It can be explicitly provided or automatically read from the environment variableJINA_API_KEY
(recommended).model
: The name of the Jina model to use. Check the list of available models on Jina documentation.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.
JinaTextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
JinaTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "JinaTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
JinaTextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a string.
Arguments:
text
: The string to embed.
Raises:
TypeError
: If the input is not a string.
Returns:
A dictionary with following keys:
embedding
: The embedding of the input string.meta
: A dictionary with metadata including the model name and usage statistics.