DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
API Reference

Transforms queries into vectors to look for similar or relevant Documents.

Module azure_document_embedder

AzureOpenAIDocumentEmbedder

A component for computing Document embeddings using OpenAI models on Azure.

Usage example:

from haystack import Document
from haystack.components.embedders import AzureOpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = AzureOpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

AzureOpenAIDocumentEmbedder.__init__

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None)

Create an AzureOpenAIDocumentEmbedder component.

Arguments:

  • azure_endpoint: The endpoint of the deployed model.
  • api_version: The version of the API to use.
  • azure_deployment: The deployment of the model, usually matches the model name.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
  • api_key: The API key used for authentication.
  • azure_ad_token: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.
  • timeout: The timeout in seconds to be passed to the underlying AzureOpenAI client, if not set it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
  • max_retries: Maximum retries to establish a connection with AzureOpenAI if it returns an internal error, if not set it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.

AzureOpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document]) -> Dict[str, Any]

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings
  • meta: Information about the usage of the model.

Module azure_text_embedder

AzureOpenAITextEmbedder

A component for embedding strings using OpenAI models on Azure.

Usage example:

from haystack.components.embedders import AzureOpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = AzureOpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

AzureOpenAITextEmbedder.__init__

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             prefix: str = "",
             suffix: str = "")

Create an AzureOpenAITextEmbedder component.

Arguments:

  • azure_endpoint: The endpoint of the deployed model.
  • api_version: The version of the API to use.
  • azure_deployment: The deployment of the model, usually matches the model name.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
  • api_key: The API key used for authentication.
  • azure_ad_token: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • timeout: The timeout in seconds to be passed to the underlying AzureOpenAI client, if not set it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
  • max_retries: Maximum retries to establish a connection with AzureOpenAI if it returns an internal error, if not set it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.

AzureOpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.
  • meta: Information about the usage of the model.

Module hugging_face_api_document_embedder

HuggingFaceAPIDocumentEmbedder

A component that embeds documents using Hugging Face APIs.

This component can be used to compute Document embeddings using different Hugging Face APIs:

Example usage with the free Serverless Inference API:

from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
from haystack.dataclasses import Document

doc = Document(content="I love pizza!")

doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="serverless_inference_api",
                                              api_params={"model": "BAAI/bge-small-en-v1.5"},
                                              token=Secret.from_token("<your-api-key>"))

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

Example usage with paid Inference Endpoints:

from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
from haystack.dataclasses import Document

doc = Document(content="I love pizza!")

doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="inference_endpoints",
                                              api_params={"url": "<your-inference-endpoint-url>"},
                                              token=Secret.from_token("<your-api-key>"))

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

Example usage with self-hosted Text Embeddings Inference:

from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.dataclasses import Document

doc = Document(content="I love pizza!")

doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="text_embeddings_inference",
                                              api_params={"url": "http://localhost:8080"})

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

HuggingFaceAPIDocumentEmbedder.__init__

def __init__(api_type: Union[HFEmbeddingAPIType, str],
             api_params: Dict[str, str],
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             prefix: str = "",
             suffix: str = "",
             truncate: bool = True,
             normalize: bool = False,
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create an HuggingFaceAPITextEmbedder component.

Arguments:

  • api_type: The type of Hugging Face API to use.
  • api_params: A dictionary containing the following keys:
  • model: model ID on the Hugging Face Hub. Required when api_type is SERVERLESS_INFERENCE_API.
  • url: URL of the inference endpoint. Required when api_type is INFERENCE_ENDPOINTS or TEXT_EMBEDDINGS_INFERENCE.
  • token: The HuggingFace token to use as HTTP bearer authorization. You can find your HF token in your account settings.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • truncate: Truncate input text from the end to the maximum length supported by the model. This parameter takes effect when the api_type is TEXT_EMBEDDINGS_INFERENCE. It also takes effect when the api_type is INFERENCE_ENDPOINTS and the backend is based on Text Embeddings Inference. This parameter is ignored when the api_type is SERVERLESS_INFERENCE_API (it is always set to True and cannot be changed).
  • normalize: Normalize the embeddings to unit length. This parameter takes effect when the api_type is TEXT_EMBEDDINGS_INFERENCE. It also takes effect when the api_type is INFERENCE_ENDPOINTS and the backend is based on Text Embeddings Inference. This parameter is ignored when the api_type is SERVERLESS_INFERENCE_API (it is always set to False and cannot be changed).
  • batch_size: Number of Documents to process at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

HuggingFaceAPIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceAPIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceAPIDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings

Module hugging_face_api_text_embedder

HuggingFaceAPITextEmbedder

A component that embeds text using Hugging Face APIs.

This component can be used to embed strings using different Hugging Face APIs:

Example usage with the free Serverless Inference API:

from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret

text_embedder = HuggingFaceAPITextEmbedder(api_type="serverless_inference_api",
                                           api_params={"model": "BAAI/bge-small-en-v1.5"},
                                           token=Secret.from_token("<your-api-key>"))

print(text_embedder.run("I love pizza!"))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

Example usage with paid Inference Endpoints:

from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
text_embedder = HuggingFaceAPITextEmbedder(api_type="inference_endpoints",
                                           api_params={"model": "BAAI/bge-small-en-v1.5"},
                                           token=Secret.from_token("<your-api-key>"))

print(text_embedder.run("I love pizza!"))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

Example usage with self-hosted Text Embeddings Inference:

from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret

text_embedder = HuggingFaceAPITextEmbedder(api_type="text_embeddings_inference",
                                           api_params={"url": "http://localhost:8080"})

print(text_embedder.run("I love pizza!"))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

HuggingFaceAPITextEmbedder.__init__

def __init__(api_type: Union[HFEmbeddingAPIType, str],
             api_params: Dict[str, str],
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             prefix: str = "",
             suffix: str = "",
             truncate: bool = True,
             normalize: bool = False)

Create an HuggingFaceAPITextEmbedder component.

Arguments:

  • api_type: The type of Hugging Face API to use.
  • api_params: A dictionary containing the following keys:
  • model: model ID on the Hugging Face Hub. Required when api_type is SERVERLESS_INFERENCE_API.
  • url: URL of the inference endpoint. Required when api_type is INFERENCE_ENDPOINTS or TEXT_EMBEDDINGS_INFERENCE.
  • token: The HuggingFace token to use as HTTP bearer authorization You can find your HF token in your account settings
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • truncate: Truncate input text from the end to the maximum length supported by the model. This parameter takes effect when the api_type is TEXT_EMBEDDINGS_INFERENCE. It also takes effect when the api_type is INFERENCE_ENDPOINTS and the backend is based on Text Embeddings Inference. This parameter is ignored when the api_type is SERVERLESS_INFERENCE_API (it is always set to True and cannot be changed).
  • normalize: Normalize the embeddings to unit length. This parameter takes effect when the api_type is TEXT_EMBEDDINGS_INFERENCE. It also takes effect when the api_type is INFERENCE_ENDPOINTS and the backend is based on Text Embeddings Inference. This parameter is ignored when the api_type is SERVERLESS_INFERENCE_API (it is always set to False and cannot be changed).

HuggingFaceAPITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceAPITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceAPITextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.

Module openai_document_embedder

OpenAIDocumentEmbedder

A component for computing Document embeddings using OpenAI models.

Usage example:

from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = OpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

OpenAIDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None)

Create a OpenAIDocumentEmbedder component.

By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.

Arguments:

  • api_key: The OpenAI API key.
  • model: The name of the model to use.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
  • api_base_url: Overrides default base url for all HTTP requests.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.
  • timeout: Timeout for OpenAI Client calls, if not set it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
  • max_retries: Maximum retries to stablish contact with OpenAI if it returns an internal error, if not set it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.

OpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings
  • meta: Information about the usage of the model.

Module openai_text_embedder

OpenAITextEmbedder

A component for embedding strings using OpenAI models.

Usage example:

from haystack.components.embedders import OpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

OpenAITextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None)

Create an OpenAITextEmbedder component.

By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.

Arguments:

  • api_key: The OpenAI API key.
  • model: The name of the model to use.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 a nd later models.
  • api_base_url: Overrides default base url for all HTTP requests.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • timeout: Timeout for OpenAI Client calls, if not set it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
  • max_retries: Maximum retries to stablish contact with OpenAI if it returns an internal error, if not set it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.

OpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.
  • meta: Information about the usage of the model.

Module sentence_transformers_document_embedder

SentenceTransformersDocumentEmbedder

A component for computing Document embeddings using Sentence Transformers models.

Usage example:

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()

result = doc_embedder.run([doc])
print(result['documents'][0].embedding)

# [-0.07804739475250244, 0.1498992145061493, ...]

SentenceTransformersDocumentEmbedder.__init__

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n",
             trust_remote_code: bool = False)

Create a SentenceTransformersDocumentEmbedder component.

Arguments:

  • model: Local path or ID of the model on HuggingFace Hub.
  • device: Overrides the default device used to load the model.
  • token: The API token used to download private models from Hugging Face.
  • prefix: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • normalize_embeddings: If True returned vectors will have length 1.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.
  • trust_remote_code: If False, only Hugging Face verified model architectures are allowed. If True, custom models and scripts are allowed.

SentenceTransformersDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str,
                              Any]) -> "SentenceTransformersDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersDocumentEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings

Module sentence_transformers_text_embedder

SentenceTransformersTextEmbedder

A component for embedding strings using Sentence Transformers models.

Usage example:

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

SentenceTransformersTextEmbedder.__init__

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False,
             trust_remote_code: bool = False)

Create a SentenceTransformersTextEmbedder component.

Arguments:

  • model: Local path or ID of the model on HuggingFace Hub.
  • device: Overrides the default device used to load the model.
  • token: The API token used to download private models from Hugging Face.
  • prefix: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • normalize_embeddings: If True returned vectors will have length 1.
  • trust_remote_code: If False, only Hugging Face verified model architectures are allowed. If True, custom models and scripts are allowed.

SentenceTransformersTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SentenceTransformersTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersTextEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersTextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.