Module azure_document_embedder

AzureOpenAIDocumentEmbedder

@component
class AzureOpenAIDocumentEmbedder()

A component for computing Document embeddings using OpenAI models on Azure.

Usage example:

from haystack import Document
from haystack.components.embedders import AzureOpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = AzureOpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

AzureOpenAIDocumentEmbedder.init

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create an AzureOpenAITextEmbedder component.

Arguments:

azure_endpoint: The endpoint of the deployed model.
api_version: The version of the API to use.
azure_deployment: The deployment of the model, usually matches the model name.
api_key: The API key used for authentication.
azure_ad_token: Microsoft Entra ID token, see Microsoft's official
Entra ID
documentation for more information.
Used to be called Azure Active Directory.
organization: The Organization ID. See OpenAI's
production best practices
for more information.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: If True shows a progress bar when running.
meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.

AzureOpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document]) -> Dict[str, Any]

Embed a list of Documents.

Arguments:

documents: Documents to embed.

Returns:

A dictionary with the following keys:

documents: Documents with embeddings
meta: Information about the usage of the model.

Module azure_text_embedder

AzureOpenAITextEmbedder

@component
class AzureOpenAITextEmbedder()

A component for embedding strings using OpenAI models on Azure.

Usage example:

from haystack.components.embedders import AzureOpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = AzureOpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

AzureOpenAITextEmbedder.init

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: str = "text-embedding-ada-002",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "")

Create an AzureOpenAITextEmbedder component.

Arguments:

azure_endpoint: The endpoint of the deployed model.
api_version: The version of the API to use.
azure_deployment: The deployment of the model, usually matches the model name.
api_key: The API key used for authentication.
azure_ad_token: Microsoft Entra ID token, see Microsoft's official
Entra ID
documentation for more information.
Used to be called Azure Active Directory.
organization: The Organization ID. See OpenAI's
production best practices
for more information.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.

AzureOpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AzureOpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

AzureOpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

text: Text to embed.

Returns:

A dictionary with the following keys:

embedding: The embedding of the input text.
meta: Information about the usage of the model.

Module hugging_face_tei_document_embedder

HuggingFaceTEIDocumentEmbedder

@component
class HuggingFaceTEIDocumentEmbedder()

A component for computing Document embeddings using HuggingFace Text-Embeddings-Inference endpoints.

This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited
Inference API tier, for embedding models hosted on the paid inference endpoint
and/or your own custom TEI endpoint.

Usage example:

from haystack.dataclasses import Document
from haystack.components.embedders import HuggingFaceTEIDocumentEmbedder
from haystack.utils import Secret

doc = Document(content="I love pizza!")

document_embedder = HuggingFaceTEIDocumentEmbedder(
    model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

HuggingFaceTEIDocumentEmbedder.init

def __init__(model: str = "BAAI/bge-small-en-v1.5",
             url: Optional[str] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a HuggingFaceTEIDocumentEmbedder component.

Arguments:

model: ID of the model on HuggingFace Hub.
url: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference
Endpoint.
token: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving
a private or gated model.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: If True shows a progress bar when running.
meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.

HuggingFaceTEIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceTEIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceTEIDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

documents: Documents to embed.

Returns:

A dictionary with the following keys:

documents: Documents with embeddings

Module hugging_face_tei_text_embedder

HuggingFaceTEITextEmbedder

@component
class HuggingFaceTEITextEmbedder()

A component for embedding strings using HuggingFace Text-Embeddings-Inference endpoints.

Usage example:

from haystack.components.embedders import HuggingFaceTEITextEmbedder
from haystack.utils import Secret

text_to_embed = "I love pizza!"

text_embedder = HuggingFaceTEITextEmbedder(
    model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

HuggingFaceTEITextEmbedder.init

def __init__(model: str = "BAAI/bge-small-en-v1.5",
             url: Optional[str] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "")

Create an HuggingFaceTEITextEmbedder component.

Arguments:

model: ID of the model on HuggingFace Hub.
url: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference
Endpoint.
token: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving
a private or gated model.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.

HuggingFaceTEITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceTEITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

HuggingFaceTEITextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

text: Text to embed.

Returns:

A dictionary with the following keys:

embedding: The embedding of the input text.

Module openai_document_embedder

OpenAIDocumentEmbedder

@component
class OpenAIDocumentEmbedder()

A component for computing Document embeddings using OpenAI models.

Usage example:

from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = OpenAIDocumentEmbedder()

result = document_embedder.run([doc])
print(result['documents'][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

OpenAIDocumentEmbedder.init

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a OpenAIDocumentEmbedder component.

Arguments:

api_key: The OpenAI API key.
model: The name of the model to use.
dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
api_base_url: Overrides default base url for all HTTP requests.
organization: The Organization ID. See OpenAI's
production best practices
for more information.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: If True shows a progress bar when running.
meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.

OpenAIDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAIDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAIDocumentEmbedder.run

@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

documents: Documents to embed.

Returns:

A dictionary with the following keys:

documents: Documents with embeddings
meta: Information about the usage of the model.

Module openai_text_embedder

OpenAITextEmbedder

@component
class OpenAITextEmbedder()

A component for embedding strings using OpenAI models.

Usage example:

from haystack.components.embedders import OpenAITextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OpenAITextEmbedder()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
#          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}

OpenAITextEmbedder.init

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "text-embedding-ada-002",
             dimensions: Optional[int] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             prefix: str = "",
             suffix: str = "")

Create an OpenAITextEmbedder component.

Arguments:

api_key: The OpenAI API key.
model: The name of the model to use.
dimensions: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
api_base_url: Overrides default base url for all HTTP requests.
organization: The Organization ID. See OpenAI's
production best practices
for more information.
prefix: A string to add at the beginning of each text.
suffix: A string to add at the end of each text.

OpenAITextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OpenAITextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAITextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

OpenAITextEmbedder.run

@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)

Embed a single string.

Arguments:

text: Text to embed.

Returns:

A dictionary with the following keys:

embedding: The embedding of the input text.
meta: Information about the usage of the model.

Module sentence_transformers_document_embedder

SentenceTransformersDocumentEmbedder

@component
class SentenceTransformersDocumentEmbedder()

A component for computing Document embeddings using Sentence Transformers models.

Usage example:

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()

result = doc_embedder.run([doc])
print(result['documents'][0].embedding)

# [-0.07804739475250244, 0.1498992145061493, ...]

SentenceTransformersDocumentEmbedder.init

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False,
             meta_fields_to_embed: Optional[List[str]] = None,
             embedding_separator: str = "\n")

Create a SentenceTransformersDocumentEmbedder component.

Arguments:

model: Local path or ID of the model on HuggingFace Hub.
device: Overrides the default device used to load the model.
token: The API token used to download private models from Hugging Face.
prefix: A string to add at the beginning of each text.
Can be used to prepend the text with an instruction, as required by some embedding models,
such as E5 and bge.
suffix: A string to add at the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: If True shows a progress bar when running.
normalize_embeddings: If True returned vectors will have length 1.
meta_fields_to_embed: List of meta fields that will be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.

SentenceTransformersDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str,
                              Any]) -> "SentenceTransformersDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersDocumentEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

Arguments:

documents: Documents to embed.

Returns:

A dictionary with the following keys:

documents: Documents with embeddings

Module sentence_transformers_text_embedder

SentenceTransformersTextEmbedder

@component
class SentenceTransformersTextEmbedder()

A component for embedding strings using Sentence Transformers models.

Usage example:

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

SentenceTransformersTextEmbedder.init

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                           strict=False),
             prefix: str = "",
             suffix: str = "",
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False)

Create a SentenceTransformersTextEmbedder component.

Arguments:

model: Local path or ID of the model on HuggingFace Hub.
device: Overrides the default device used to load the model.
token: The API token used to download private models from Hugging Face.
prefix: A string to add at the beginning of each text.
Can be used to prepend the text with an instruction, as required by some embedding models,
such as E5 and bge.
suffix: A string to add at the end of each text.
batch_size: Number of Documents to encode at once.
progress_bar: If True shows a progress bar when running.
normalize_embeddings: If True returned vectors will have length 1.

SentenceTransformersTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SentenceTransformersTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersTextEmbedder.warm_up

def warm_up()

Initializes the component.

SentenceTransformersTextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a single string.

Arguments:

text: Text to embed.

Returns:

A dictionary with the following keys:

embedding: The embedding of the input text.

Module azure_document_embedder

AzureOpenAIDocumentEmbedder

AzureOpenAIDocumentEmbedder.__init__

AzureOpenAIDocumentEmbedder.to_dict

AzureOpenAIDocumentEmbedder.from_dict

AzureOpenAIDocumentEmbedder.run

Module azure_text_embedder

AzureOpenAITextEmbedder

AzureOpenAITextEmbedder.__init__

AzureOpenAITextEmbedder.to_dict

AzureOpenAITextEmbedder.from_dict

AzureOpenAITextEmbedder.run

Module hugging_face_tei_document_embedder

HuggingFaceTEIDocumentEmbedder

HuggingFaceTEIDocumentEmbedder.__init__

HuggingFaceTEIDocumentEmbedder.to_dict

HuggingFaceTEIDocumentEmbedder.from_dict

HuggingFaceTEIDocumentEmbedder.run

Module hugging_face_tei_text_embedder

HuggingFaceTEITextEmbedder

HuggingFaceTEITextEmbedder.__init__

HuggingFaceTEITextEmbedder.to_dict

HuggingFaceTEITextEmbedder.from_dict

HuggingFaceTEITextEmbedder.run

Module openai_document_embedder

OpenAIDocumentEmbedder

OpenAIDocumentEmbedder.__init__

OpenAIDocumentEmbedder.to_dict

OpenAIDocumentEmbedder.from_dict

OpenAIDocumentEmbedder.run

Module openai_text_embedder

OpenAITextEmbedder

OpenAITextEmbedder.__init__

OpenAITextEmbedder.to_dict

OpenAITextEmbedder.from_dict

OpenAITextEmbedder.run

Module sentence_transformers_document_embedder

SentenceTransformersDocumentEmbedder

SentenceTransformersDocumentEmbedder.__init__

SentenceTransformersDocumentEmbedder.to_dict

SentenceTransformersDocumentEmbedder.from_dict

SentenceTransformersDocumentEmbedder.warm_up

SentenceTransformersDocumentEmbedder.run

Module sentence_transformers_text_embedder

SentenceTransformersTextEmbedder

SentenceTransformersTextEmbedder.__init__

SentenceTransformersTextEmbedder.to_dict

SentenceTransformersTextEmbedder.from_dict

SentenceTransformersTextEmbedder.warm_up

SentenceTransformersTextEmbedder.run

AzureOpenAIDocumentEmbedder.init

AzureOpenAITextEmbedder.init

HuggingFaceTEIDocumentEmbedder.init

HuggingFaceTEITextEmbedder.init

OpenAIDocumentEmbedder.init

OpenAITextEmbedder.init

SentenceTransformersDocumentEmbedder.init

SentenceTransformersTextEmbedder.init