DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Transforms queries into vectors to look for similar or relevant Documents.

Module azure_document_embedder

AzureOpenAIDocumentEmbedder

@component
class AzureOpenAIDocumentEmbedder()

A component for computing Document embeddings using OpenAI models on Azure.

Usage example:

from haystack import Document
from haystack.components.embedders import AzureOpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = AzureOpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

AzureOpenAIDocumentEmbedder.__init__

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create an AzureOpenAITextEmbedder component.

Arguments:

  • azure_endpoint: The endpoint of the deployed model.
  • api_version: The version of the API to use.
  • azure_deployment: The deployment of the model, usually matches the model name.
  • api_key: The API key used for authentication.
  • azure_ad_token: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

AzureOpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document]) -> Dict[str, Any]

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings
  • meta: Information about the usage of the model.

Module azure_text_embedder

AzureOpenAITextEmbedder

@component
class AzureOpenAITextEmbedder()

A component for embedding strings using OpenAI models on Azure.

Usage example:

from haystack.components.embedders import AzureOpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = AzureOpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

AzureOpenAITextEmbedder.__init__

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "")

Create an AzureOpenAITextEmbedder component.

Arguments:

  • azure_endpoint: The endpoint of the deployed model.
  • api_version: The version of the API to use.
  • azure_deployment: The deployment of the model, usually matches the model name.
  • api_key: The API key used for authentication.
  • azure_ad_token: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.

AzureOpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.
  • meta: Information about the usage of the model.

Module hugging_face_tei_document_embedder

HuggingFaceTEIDocumentEmbedder

@component
class HuggingFaceTEIDocumentEmbedder()

A component for computing Document embeddings using HuggingFace Text-Embeddings-Inference endpoints.

This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.

Usage example:

from haystack.dataclasses import Document
from haystack.components.embedders import HuggingFaceTEIDocumentEmbedder
from haystack.utils import Secret

doc = Document(content="I love pizza!")

document_embedder = HuggingFaceTEIDocumentEmbedder(
    model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

HuggingFaceTEIDocumentEmbedder.__init__

def __init__(model: str = "BAAI/bge-small-en-v1.5",
             url: Optional[str] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a HuggingFaceTEIDocumentEmbedder component.

Arguments:

  • model: ID of the model on HuggingFace Hub.
  • url: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.
  • token: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

HuggingFaceTEIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceTEIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceTEIDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings

Module hugging_face_tei_text_embedder

HuggingFaceTEITextEmbedder

@component
class HuggingFaceTEITextEmbedder()

A component for embedding strings using HuggingFace Text-Embeddings-Inference endpoints.

This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.

Usage example:

from haystack.components.embedders import HuggingFaceTEITextEmbedder
from haystack.utils import Secret

text_to_embed = "I love pizza!"

text_embedder = HuggingFaceTEITextEmbedder(
    model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

HuggingFaceTEITextEmbedder.__init__

def __init__(model: str = "BAAI/bge-small-en-v1.5",
             url: Optional[str] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "")

Create an HuggingFaceTEITextEmbedder component.

Arguments:

  • model: ID of the model on HuggingFace Hub.
  • url: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.
  • token: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.

HuggingFaceTEITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceTEITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceTEITextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.

Module openai_document_embedder

OpenAIDocumentEmbedder

@component
class OpenAIDocumentEmbedder()

A component for computing Document embeddings using OpenAI models.

Usage example:

from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = OpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

OpenAIDocumentEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a OpenAIDocumentEmbedder component.

Arguments:

  • api_key: The OpenAI API key.
  • model: The name of the model to use.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
  • api_base_url: Overrides default base url for all HTTP requests.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

OpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings
  • meta: Information about the usage of the model.

Module openai_text_embedder

OpenAITextEmbedder

@component
class OpenAITextEmbedder()

A component for embedding strings using OpenAI models.

Usage example:

from haystack.components.embedders import OpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

OpenAITextEmbedder.__init__

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "")

Create an OpenAITextEmbedder component.

Arguments:

  • api_key: The OpenAI API key.
  • model: The name of the model to use.
  • dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
  • api_base_url: Overrides default base url for all HTTP requests.
  • organization: The Organization ID. See OpenAI's production best practices for more information.
  • prefix: A string to add at the beginning of each text.
  • suffix: A string to add at the end of each text.

OpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.
  • meta: Information about the usage of the model.

Module sentence_transformers_document_embedder

SentenceTransformersDocumentEmbedder

@component
class SentenceTransformersDocumentEmbedder()

A component for computing Document embeddings using Sentence Transformers models.

Usage example:

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()

result = doc_embedder.run([doc])
print(result['documents'][0].embedding)

# [-0.07804739475250244, 0.1498992145061493, ...]

SentenceTransformersDocumentEmbedder.__init__

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a SentenceTransformersDocumentEmbedder component.

Arguments:

  • model: Local path or ID of the model on HuggingFace Hub.
  • device: Overrides the default device used to load the model.
  • token: The API token used to download private models from Hugging Face.
  • prefix: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • normalize_embeddings: If True returned vectors will have length 1.
  • meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

SentenceTransformersDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str,
                              Any]) -> "SentenceTransformersDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersDocumentEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

  • documents: Documents to embed.

Returns:

A dictionary with the following keys:

  • documents: Documents with embeddings

Module sentence_transformers_text_embedder

SentenceTransformersTextEmbedder

@component
class SentenceTransformersTextEmbedder()

A component for embedding strings using Sentence Transformers models.

Usage example:

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

SentenceTransformersTextEmbedder.__init__

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False)

Create a SentenceTransformersTextEmbedder component.

Arguments:

  • model: Local path or ID of the model on HuggingFace Hub.
  • device: Overrides the default device used to load the model.
  • token: The API token used to download private models from Hugging Face.
  • prefix: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.
  • suffix: A string to add at the end of each text.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: If True shows a progress bar when running.
  • normalize_embeddings: If True returned vectors will have length 1.

SentenceTransformersTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SentenceTransformersTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersTextEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersTextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

  • text: Text to embed.

Returns:

A dictionary with the following keys:

  • embedding: The embedding of the input text.