Transforms queries into vectors to look for similar or relevant Documents.
Module azure_document_embedder
AzureOpenAIDocumentEmbedder
@component
class AzureOpenAIDocumentEmbedder()
A component for computing Document embeddings using OpenAI models on Azure.
Usage example:
from haystack import Document
from haystack.components.embedders import AzureOpenAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = AzureOpenAIDocumentEmbedder()
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
AzureOpenAIDocumentEmbedder.__init__
def __init__(azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: str = "text-embedding-ada-002",
api_key: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_API_KEY", strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create an AzureOpenAITextEmbedder component.
Arguments:
azure_endpoint
: The endpoint of the deployed model.api_version
: The version of the API to use.azure_deployment
: The deployment of the model, usually matches the model name.api_key
: The API key used for authentication.azure_ad_token
: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
AzureOpenAIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
AzureOpenAIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
AzureOpenAIDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document]) -> Dict[str, Any]
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddingsmeta
: Information about the usage of the model.
Module azure_text_embedder
AzureOpenAITextEmbedder
@component
class AzureOpenAITextEmbedder()
A component for embedding strings using OpenAI models on Azure.
Usage example:
from haystack.components.embedders import AzureOpenAITextEmbedder
text_to_embed = "I love pizza!"
text_embedder = AzureOpenAITextEmbedder()
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
AzureOpenAITextEmbedder.__init__
def __init__(azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: str = "text-embedding-ada-002",
api_key: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_API_KEY", strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "")
Create an AzureOpenAITextEmbedder component.
Arguments:
azure_endpoint
: The endpoint of the deployed model.api_version
: The version of the API to use.azure_deployment
: The deployment of the model, usually matches the model name.api_key
: The API key used for authentication.azure_ad_token
: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.
AzureOpenAITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
AzureOpenAITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
AzureOpenAITextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.meta
: Information about the usage of the model.
Module hugging_face_tei_document_embedder
HuggingFaceTEIDocumentEmbedder
@component
class HuggingFaceTEIDocumentEmbedder()
A component for computing Document embeddings using HuggingFace Text-Embeddings-Inference endpoints.
This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.
Usage example:
from haystack.dataclasses import Document
from haystack.components.embedders import HuggingFaceTEIDocumentEmbedder
from haystack.utils import Secret
doc = Document(content="I love pizza!")
document_embedder = HuggingFaceTEIDocumentEmbedder(
model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
HuggingFaceTEIDocumentEmbedder.__init__
def __init__(model: str = "BAAI/bge-small-en-v1.5",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a HuggingFaceTEIDocumentEmbedder component.
Arguments:
model
: ID of the model on HuggingFace Hub.url
: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.token
: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
HuggingFaceTEIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceTEIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceTEIDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddings
Module hugging_face_tei_text_embedder
HuggingFaceTEITextEmbedder
@component
class HuggingFaceTEITextEmbedder()
A component for embedding strings using HuggingFace Text-Embeddings-Inference endpoints.
This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.
Usage example:
from haystack.components.embedders import HuggingFaceTEITextEmbedder
from haystack.utils import Secret
text_to_embed = "I love pizza!"
text_embedder = HuggingFaceTEITextEmbedder(
model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
HuggingFaceTEITextEmbedder.__init__
def __init__(model: str = "BAAI/bge-small-en-v1.5",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "")
Create an HuggingFaceTEITextEmbedder component.
Arguments:
model
: ID of the model on HuggingFace Hub.url
: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.token
: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.
HuggingFaceTEITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceTEITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceTEITextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.
Module openai_document_embedder
OpenAIDocumentEmbedder
@component
class OpenAIDocumentEmbedder()
A component for computing Document embeddings using OpenAI models.
Usage example:
from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = OpenAIDocumentEmbedder()
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
OpenAIDocumentEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a OpenAIDocumentEmbedder component.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported intext-embedding-3
and later models.api_base_url
: Overrides default base url for all HTTP requests.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
OpenAIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenAIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenAIDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddingsmeta
: Information about the usage of the model.
Module openai_text_embedder
OpenAITextEmbedder
@component
class OpenAITextEmbedder()
A component for embedding strings using OpenAI models.
Usage example:
from haystack.components.embedders import OpenAITextEmbedder
text_to_embed = "I love pizza!"
text_embedder = OpenAITextEmbedder()
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
OpenAITextEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "")
Create an OpenAITextEmbedder component.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported intext-embedding-3
and later models.api_base_url
: Overrides default base url for all HTTP requests.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.
OpenAITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenAITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenAITextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.meta
: Information about the usage of the model.
Module sentence_transformers_document_embedder
SentenceTransformersDocumentEmbedder
@component
class SentenceTransformersDocumentEmbedder()
A component for computing Document embeddings using Sentence Transformers models.
Usage example:
from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()
result = doc_embedder.run([doc])
print(result['documents'][0].embedding)
# [-0.07804739475250244, 0.1498992145061493, ...]
SentenceTransformersDocumentEmbedder.__init__
def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
normalize_embeddings: bool = False,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a SentenceTransformersDocumentEmbedder component.
Arguments:
model
: Local path or ID of the model on HuggingFace Hub.device
: Overrides the default device used to load the model.token
: The API token used to download private models from Hugging Face.prefix
: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.normalize_embeddings
: If True returned vectors will have length 1.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
SentenceTransformersDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
SentenceTransformersDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str,
Any]) -> "SentenceTransformersDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
SentenceTransformersDocumentEmbedder.warm_up
def warm_up()
Initializes the component.
SentenceTransformersDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddings
Module sentence_transformers_text_embedder
SentenceTransformersTextEmbedder
@component
class SentenceTransformersTextEmbedder()
A component for embedding strings using Sentence Transformers models.
Usage example:
from haystack.components.embedders import SentenceTransformersTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()
print(text_embedder.run(text_to_embed))
# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
SentenceTransformersTextEmbedder.__init__
def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
normalize_embeddings: bool = False)
Create a SentenceTransformersTextEmbedder component.
Arguments:
model
: Local path or ID of the model on HuggingFace Hub.device
: Overrides the default device used to load the model.token
: The API token used to download private models from Hugging Face.prefix
: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.normalize_embeddings
: If True returned vectors will have length 1.
SentenceTransformersTextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
SentenceTransformersTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SentenceTransformersTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
SentenceTransformersTextEmbedder.warm_up
def warm_up()
Initializes the component.
SentenceTransformersTextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.