Transforms queries into vectors to look for similar or relevant Documents.
Module azure_document_embedder
AzureOpenAIDocumentEmbedder
A component for computing Document embeddings using OpenAI models on Azure.
Usage example:
from haystack import Document
from haystack.components.embedders import AzureOpenAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = AzureOpenAIDocumentEmbedder()
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
AzureOpenAIDocumentEmbedder.__init__
def __init__(azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_key: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_API_KEY", strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create an AzureOpenAITextEmbedder component.
Arguments:
azure_endpoint
: The endpoint of the deployed model.api_version
: The version of the API to use.azure_deployment
: The deployment of the model, usually matches the model name.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.api_key
: The API key used for authentication.azure_ad_token
: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
AzureOpenAIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
AzureOpenAIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
AzureOpenAIDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document]) -> Dict[str, Any]
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddingsmeta
: Information about the usage of the model.
Module azure_text_embedder
AzureOpenAITextEmbedder
A component for embedding strings using OpenAI models on Azure.
Usage example:
from haystack.components.embedders import AzureOpenAITextEmbedder
text_to_embed = "I love pizza!"
text_embedder = AzureOpenAITextEmbedder()
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
AzureOpenAITextEmbedder.__init__
def __init__(azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_key: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_API_KEY", strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "")
Create an AzureOpenAITextEmbedder component.
Arguments:
azure_endpoint
: The endpoint of the deployed model.api_version
: The version of the API to use.azure_deployment
: The deployment of the model, usually matches the model name.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.api_key
: The API key used for authentication.azure_ad_token
: Microsoft Entra ID token, see Microsoft's official Entra ID documentation for more information. Used to be called Azure Active Directory.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.
AzureOpenAITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
AzureOpenAITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
AzureOpenAITextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.meta
: Information about the usage of the model.
Module hugging_face_tei_document_embedder
HuggingFaceTEIDocumentEmbedder
A component for computing Document embeddings using HuggingFace Text-Embeddings-Inference endpoints.
This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.
Usage example:
from haystack.dataclasses import Document
from haystack.components.embedders import HuggingFaceTEIDocumentEmbedder
from haystack.utils import Secret
doc = Document(content="I love pizza!")
document_embedder = HuggingFaceTEIDocumentEmbedder(
model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
HuggingFaceTEIDocumentEmbedder.__init__
def __init__(model: str = "BAAI/bge-small-en-v1.5",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
truncate: bool = True,
normalize: bool = False,
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a HuggingFaceTEIDocumentEmbedder component.
Arguments:
model
: ID of the model on HuggingFace Hub.url
: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.token
: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.truncate
: Truncate input text from the end to the maximum length supported by the model. This option is only available for self-deployed Text Embedding Inference (TEI) endpoints and paid HF Inference Endpoints deployed with TEI. It will be ignored when used with free HF Inference endpoints or paid HF Inference endpoints deployed without TEI.normalize
: Normalize the embeddings to unit length. This option is only available for self-deployed Text Embedding Inference (TEI) endpoints and paid HF Inference Endpoints deployed with TEI. It will be ignored when used with free HF Inference endpoints or paid HF Inference endpoints deployed without TEI.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
HuggingFaceTEIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceTEIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceTEIDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddings
Module hugging_face_tei_text_embedder
HuggingFaceTEITextEmbedder
A component for embedding strings using HuggingFace Text-Embeddings-Inference endpoints.
This component can be used with embedding models hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier, for embedding models hosted on the paid inference endpoint and/or your own custom TEI endpoint.
Usage example:
from haystack.components.embedders import HuggingFaceTEITextEmbedder
from haystack.utils import Secret
text_to_embed = "I love pizza!"
text_embedder = HuggingFaceTEITextEmbedder(
model="BAAI/bge-small-en-v1.5", token=Secret.from_token("<your-api-key>")
)
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
HuggingFaceTEITextEmbedder.__init__
def __init__(model: str = "BAAI/bge-small-en-v1.5",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
truncate: bool = True,
normalize: bool = False)
Create an HuggingFaceTEITextEmbedder component.
Arguments:
model
: ID of the model on HuggingFace Hub.url
: The URL of your self-deployed Text-Embeddings-Inference service or the URL of your paid HF Inference Endpoint.token
: The HuggingFace Hub token. This is needed if you are using a paid HF Inference Endpoint or serving a private or gated model.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.truncate
: Truncate input text from the end to the maximum length supported by the model. This option is only available for self-deployed Text Embedding Inference (TEI) endpoints and paid HF Inference Endpoints deployed with TEI. It will be ignored when used with free HF Inference endpoints or paid HF Inference endpoints deployed without TEI.normalize
: Normalize the embeddings to unit length. This option is only available for self-deployed Text Embedding Inference (TEI) endpoints and paid HF Inference Endpoints deployed with TEI. It will be ignored when used with free HF Inference endpoints or paid HF Inference endpoints deployed without TEI.
HuggingFaceTEITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceTEITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTEITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceTEITextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.
Module hugging_face_api_document_embedder
HuggingFaceAPIDocumentEmbedder
A component that embeds documents using Hugging Face APIs.
This component can be used to compute Document embeddings using different Hugging Face APIs:
- [Free Serverless Inference API]((https://huggingface.co/inference-api)
- Paid Inference Endpoints
- Self-hosted Text Embeddings Inference
Example usage with the free Serverless Inference API:
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
from haystack.dataclasses import Document
doc = Document(content="I love pizza!")
doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="serverless_inference_api",
api_params={"model": "BAAI/bge-small-en-v1.5"},
token=Secret.from_token("<your-api-key>"))
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
Example usage with paid Inference Endpoints:
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.utils import Secret
from haystack.dataclasses import Document
doc = Document(content="I love pizza!")
doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="inference_endpoints",
api_params={"url": "<your-inference-endpoint-url>"},
token=Secret.from_token("<your-api-key>"))
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
Example usage with self-hosted Text Embeddings Inference:
from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
from haystack.dataclasses import Document
doc = Document(content="I love pizza!")
doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="text_embeddings_inference",
api_params={"url": "http://localhost:8080"})
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
HuggingFaceAPIDocumentEmbedder.__init__
def __init__(api_type: Union[HFEmbeddingAPIType, str],
api_params: Dict[str, str],
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
truncate: bool = True,
normalize: bool = False,
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create an HuggingFaceAPITextEmbedder component.
Arguments:
api_type
: The type of Hugging Face API to use.api_params
: A dictionary containing the following keys:model
: model ID on the Hugging Face Hub. Required whenapi_type
isSERVERLESS_INFERENCE_API
.url
: URL of the inference endpoint. Required whenapi_type
isINFERENCE_ENDPOINTS
orTEXT_EMBEDDINGS_INFERENCE
.token
: The HuggingFace token to use as HTTP bearer authorization. You can find your HF token in your account settings.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.truncate
: Truncate input text from the end to the maximum length supported by the model. This parameter takes effect when theapi_type
isTEXT_EMBEDDINGS_INFERENCE
. It also takes effect when theapi_type
isINFERENCE_ENDPOINTS
and the backend is based on Text Embeddings Inference. This parameter is ignored when theapi_type
isSERVERLESS_INFERENCE_API
(it is always set toTrue
and cannot be changed).normalize
: Normalize the embeddings to unit length. This parameter takes effect when theapi_type
isTEXT_EMBEDDINGS_INFERENCE
. It also takes effect when theapi_type
isINFERENCE_ENDPOINTS
and the backend is based on Text Embeddings Inference. This parameter is ignored when theapi_type
isSERVERLESS_INFERENCE_API
(it is always set toFalse
and cannot be changed).batch_size
: Number of Documents to process at once.progress_bar
: IfTrue
shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
HuggingFaceAPIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceAPIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceAPIDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddings
Module hugging_face_api_text_embedder
HuggingFaceAPITextEmbedder
A component that embeds text using Hugging Face APIs.
This component can be used to embed strings using different Hugging Face APIs:
- [Free Serverless Inference API]((https://huggingface.co/inference-api)
- Paid Inference Endpoints
- Self-hosted Text Embeddings Inference
Example usage with the free Serverless Inference API:
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
text_embedder = HuggingFaceAPITextEmbedder(api_type="serverless_inference_api",
api_params={"model": "BAAI/bge-small-en-v1.5"},
token=Secret.from_token("<your-api-key>"))
print(text_embedder.run("I love pizza!"))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
Example usage with paid Inference Endpoints:
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
text_embedder = HuggingFaceAPITextEmbedder(api_type="inference_endpoints",
api_params={"model": "BAAI/bge-small-en-v1.5"},
token=Secret.from_token("<your-api-key>"))
print(text_embedder.run("I love pizza!"))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
Example usage with self-hosted Text Embeddings Inference:
from haystack.components.embedders import HuggingFaceAPITextEmbedder
from haystack.utils import Secret
text_embedder = HuggingFaceAPITextEmbedder(api_type="text_embeddings_inference",
api_params={"url": "http://localhost:8080"})
print(text_embedder.run("I love pizza!"))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
HuggingFaceAPITextEmbedder.__init__
def __init__(api_type: Union[HFEmbeddingAPIType, str],
api_params: Dict[str, str],
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
truncate: bool = True,
normalize: bool = False)
Create an HuggingFaceAPITextEmbedder component.
Arguments:
api_type
: The type of Hugging Face API to use.api_params
: A dictionary containing the following keys:model
: model ID on the Hugging Face Hub. Required whenapi_type
isSERVERLESS_INFERENCE_API
.url
: URL of the inference endpoint. Required whenapi_type
isINFERENCE_ENDPOINTS
orTEXT_EMBEDDINGS_INFERENCE
.token
: The HuggingFace token to use as HTTP bearer authorization You can find your HF token in your account settingsprefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.truncate
: Truncate input text from the end to the maximum length supported by the model. This parameter takes effect when theapi_type
isTEXT_EMBEDDINGS_INFERENCE
. It also takes effect when theapi_type
isINFERENCE_ENDPOINTS
and the backend is based on Text Embeddings Inference. This parameter is ignored when theapi_type
isSERVERLESS_INFERENCE_API
(it is always set toTrue
and cannot be changed).normalize
: Normalize the embeddings to unit length. This parameter takes effect when theapi_type
isTEXT_EMBEDDINGS_INFERENCE
. It also takes effect when theapi_type
isINFERENCE_ENDPOINTS
and the backend is based on Text Embeddings Inference. This parameter is ignored when theapi_type
isSERVERLESS_INFERENCE_API
(it is always set toFalse
and cannot be changed).
HuggingFaceAPITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceAPITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
HuggingFaceAPITextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.
Module openai_document_embedder
OpenAIDocumentEmbedder
A component for computing Document embeddings using OpenAI models.
Usage example:
from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = OpenAIDocumentEmbedder()
result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
OpenAIDocumentEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n",
timeout: Optional[float] = None,
max_retries: Optional[int] = None)
Create a OpenAIDocumentEmbedder component.
By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported intext-embedding-3
and later models.api_base_url
: Overrides default base url for all HTTP requests.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.timeout
: Timeout for OpenAI Client calls, if not set it is inferred from theOPENAI_TIMEOUT
environment variable or set to 30.max_retries
: Maximum retries to stablish contact with OpenAI if it returns an internal error, if not set it is inferred from theOPENAI_MAX_RETRIES
environment variable or set to 5.
OpenAIDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenAIDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenAIDocumentEmbedder.run
@component.output_types(documents=List[Document], meta=Dict[str, Any])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddingsmeta
: Information about the usage of the model.
Module openai_text_embedder
OpenAITextEmbedder
A component for embedding strings using OpenAI models.
Usage example:
from haystack.components.embedders import OpenAITextEmbedder
text_to_embed = "I love pizza!"
text_embedder = OpenAITextEmbedder()
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'text-embedding-ada-002-v2',
# 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
OpenAITextEmbedder.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "text-embedding-ada-002",
dimensions: Optional[int] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
prefix: str = "",
suffix: str = "",
timeout: Optional[float] = None,
max_retries: Optional[int] = None)
Create an OpenAITextEmbedder component.
By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.dimensions
: The number of dimensions the resulting output embeddings should have. Only supported intext-embedding-3
and later models.api_base_url
: Overrides default base url for all HTTP requests.organization
: The Organization ID. See OpenAI's production best practices for more information.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.timeout
: Timeout for OpenAI Client calls, if not set it is inferred from theOPENAI_TIMEOUT
environment variable or set to 30.max_retries
: Maximum retries to stablish contact with OpenAI if it returns an internal error, if not set it is inferred from theOPENAI_MAX_RETRIES
environment variable or set to 5.
OpenAITextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OpenAITextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAITextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
OpenAITextEmbedder.run
@component.output_types(embedding=List[float], meta=Dict[str, Any])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.meta
: Information about the usage of the model.
Module sentence_transformers_document_embedder
SentenceTransformersDocumentEmbedder
A component for computing Document embeddings using Sentence Transformers models.
Usage example:
from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc = Document(content="I love pizza!")
doc_embedder = SentenceTransformersDocumentEmbedder()
doc_embedder.warm_up()
result = doc_embedder.run([doc])
print(result['documents'][0].embedding)
# [-0.07804739475250244, 0.1498992145061493, ...]
SentenceTransformersDocumentEmbedder.__init__
def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
normalize_embeddings: bool = False,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n",
trust_remote_code: bool = False)
Create a SentenceTransformersDocumentEmbedder component.
Arguments:
model
: Local path or ID of the model on HuggingFace Hub.device
: Overrides the default device used to load the model.token
: The API token used to download private models from Hugging Face.prefix
: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.normalize_embeddings
: If True returned vectors will have length 1.meta_fields_to_embed
: List of meta fields that will be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.trust_remote_code
: IfFalse
, only Hugging Face verified model architectures are allowed. IfTrue
, custom models and scripts are allowed.
SentenceTransformersDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
SentenceTransformersDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str,
Any]) -> "SentenceTransformersDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
SentenceTransformersDocumentEmbedder.warm_up
def warm_up()
Initializes the component.
SentenceTransformersDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
Arguments:
documents
: Documents to embed.
Returns:
A dictionary with the following keys:
documents
: Documents with embeddings
Module sentence_transformers_text_embedder
SentenceTransformersTextEmbedder
A component for embedding strings using Sentence Transformers models.
Usage example:
from haystack.components.embedders import SentenceTransformersTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = SentenceTransformersTextEmbedder()
text_embedder.warm_up()
print(text_embedder.run(text_to_embed))
# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
SentenceTransformersTextEmbedder.__init__
def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
normalize_embeddings: bool = False,
trust_remote_code: bool = False)
Create a SentenceTransformersTextEmbedder component.
Arguments:
model
: Local path or ID of the model on HuggingFace Hub.device
: Overrides the default device used to load the model.token
: The API token used to download private models from Hugging Face.prefix
: A string to add at the beginning of each text. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and bge.suffix
: A string to add at the end of each text.batch_size
: Number of Documents to encode at once.progress_bar
: If True shows a progress bar when running.normalize_embeddings
: If True returned vectors will have length 1.trust_remote_code
: IfFalse
, only Hugging Face verified model architectures are allowed. IfTrue
, custom models and scripts are allowed.
SentenceTransformersTextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
SentenceTransformersTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SentenceTransformersTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Deserialized component.
SentenceTransformersTextEmbedder.warm_up
def warm_up()
Initializes the component.
SentenceTransformersTextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with the following keys:
embedding
: The embedding of the input text.