OptimumTextEmbedder
A component to embed text using models loaded with the Hugging Face Optimum library.
Most common position in a pipeline | Before an embedding Retriever in a query/RAG pipeline |
Mandatory init variables | "token": A HF API token. Can be set with HF_API_TOKEN env var. |
Mandatory run variables | “text”: A string |
Output variables | “embedding”: A list of float numbers (vectors) |
API reference | Optimum |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/optimum |
Overview
OptimumTextEmbedder
embeds text strings using models loaded with the HuggingFace Optimum library. It uses the ONNX runtime for high-speed inference.
The default model is sentence-transformers/all-mpnet-base-v2
.
Similarly to other Embedders, this component allows adding prefixes (and suffixes) to include instructions. For more details, refer to the component’s API reference.
There are three useful parameters specific to the Optimum Embedder that you can control with various modes:
- Pooling: generate a fixed-sized sentence embedding from a variable-sized sentence embedding
- Optimization: apply graph optimization to the model and improve inference speed
- Quantization: reduce the computational and memory costs
Find all the available mode details in our Optimum API Reference.
Authentication
The component uses a HF_API_TOKEN
environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token
– see code examples below.
The token is needed:
- If you use the Serverless Inference API, or
- If you use the Inference Endpoints.
Usage
To start using this integration with Haystack, install it with:
pip install optimum-haystack
On its own
from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
text_embedder.warm_up()
print(text_embedder.run(text_to_embed))
# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
In a pipeline
Note that this example requires GPU support to execute.
from haystack import Pipeline
from haystack_integrations.components.embedders.optimum import (
OptimumTextEmbedder,
OptimumEmbedderPooling,
OptimumEmbedderOptimizationConfig,
OptimumEmbedderOptimizationMode,
)
pipeline = Pipeline()
embedder = OptimumTextEmbedder(
model="intfloat/e5-base-v2",
normalize_embeddings=True,
onnx_execution_provider="CUDAExecutionProvider",
optimizer_settings=OptimumEmbedderOptimizationConfig(
mode=OptimumEmbedderOptimizationMode.O4,
for_gpu=True,
),
working_dir="/tmp/optimum",
pooling_mode=OptimumEmbedderPooling.MEAN,
)
pipeline.add_component("embedder", embedder)
results = pipeline.run(
{
"embedder": {
"text": "Ex profunditate antiquae doctrinae, Ad caelos supra semper, Hoc incantamentum evoco, draco apparet, Incantamentum iam transactum est"
},
}
)
print(results["embedder"]["embedding"])
Updated about 1 month ago