Optimum integration for Haystack
Module haystack_integrations.components.embedders.optimum.optimum_document_embedder
OptimumDocumentEmbedder
A component for computing Document
embeddings using models loaded with the
HuggingFace Optimum library,
leveraging the ONNX runtime for high-speed inference.
The embedding of each Document is stored in the embedding
field of the Document.
Usage example:
from haystack.dataclasses import Document
from haystack_integrations.components.embedders.optimum import OptimumDocumentEmbedder
doc = Document(content="I love pizza!")
document_embedder = OptimumDocumentEmbedder(model="sentence-transformers/all-mpnet-base-v2")
document_embedder.warm_up()
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
OptimumDocumentEmbedder.__init__
def __init__(
model: str = "sentence-transformers/all-mpnet-base-v2",
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
normalize_embeddings: bool = True,
onnx_execution_provider: str = "CPUExecutionProvider",
pooling_mode: Optional[Union[str, OptimumEmbedderPooling]] = None,
model_kwargs: Optional[Dict[str, Any]] = None,
working_dir: Optional[str] = None,
optimizer_settings: Optional[OptimumEmbedderOptimizationConfig] = None,
quantizer_settings: Optional[OptimumEmbedderQuantizationConfig] = None,
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: Optional[List[str]] = None,
embedding_separator: str = "\n")
Create a OptimumDocumentEmbedder component.
Arguments:
model
: A string representing the model id on HF Hub.token
: The HuggingFace token to use as HTTP bearer authorization.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.normalize_embeddings
: Whether to normalize the embeddings to unit length.onnx_execution_provider
: The execution provider to use for ONNX models.
Note: Using the TensorRT execution provider
TensorRT requires to build its inference engine ahead of inference,
which takes some time due to the model optimization and nodes fusion.
To avoid rebuilding the engine every time the model is loaded, ONNX
Runtime provides a pair of options to save the engine: trt_engine_cache_enable
and trt_engine_cache_path
. We recommend setting these two provider
options using the model_kwargs
parameter, when using the TensorRT execution provider.
The usage is as follows:
embedder = OptimumDocumentEmbedder(
model="sentence-transformers/all-mpnet-base-v2",
onnx_execution_provider="TensorrtExecutionProvider",
model_kwargs={
"provider_options": {
"trt_engine_cache_enable": True,
"trt_engine_cache_path": "tmp/trt_cache",
}
},
)
pooling_mode
: The pooling mode to use. WhenNone
, pooling mode will be inferred from the model config.model_kwargs
: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs overridemodel
,onnx_execution_provider
andtoken
initialization parameters.working_dir
: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.optimizer_settings
: Configuration for Optimum Embedder Optimization. IfNone
, no additional optimization is be applied.quantizer_settings
: Configuration for Optimum Embedder Quantization. IfNone
, no quantization is be applied.batch_size
: Number of Documents to encode at once.progress_bar
: Whether to show a progress bar or not.meta_fields_to_embed
: List of meta fields that should be embedded along with the Document text.embedding_separator
: Separator used to concatenate the meta fields to the Document text.
OptimumDocumentEmbedder.warm_up
def warm_up()
Initializes the component.
OptimumDocumentEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OptimumDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OptimumDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
OptimumDocumentEmbedder.run
@component.output_types(documents=List[Document])
def run(documents: List[Document])
Embed a list of Documents.
The embedding of each Document is stored in the embedding
field of the Document.
Arguments:
documents
: A list of Documents to embed.
Raises:
RuntimeError
: If the component was not initialized.TypeError
: If the input is not a list of Documents.
Returns:
The updated Documents with their embeddings.
Module haystack_integrations.components.embedders.optimum.optimum_text_embedder
OptimumTextEmbedder
A component to embed text using models loaded with the HuggingFace Optimum library, leveraging the ONNX runtime for high-speed inference.
Usage example:
from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
text_embedder.warm_up()
print(text_embedder.run(text_to_embed))
# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
OptimumTextEmbedder.__init__
def __init__(
model: str = "sentence-transformers/all-mpnet-base-v2",
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
prefix: str = "",
suffix: str = "",
normalize_embeddings: bool = True,
onnx_execution_provider: str = "CPUExecutionProvider",
pooling_mode: Optional[Union[str, OptimumEmbedderPooling]] = None,
model_kwargs: Optional[Dict[str, Any]] = None,
working_dir: Optional[str] = None,
optimizer_settings: Optional[OptimumEmbedderOptimizationConfig] = None,
quantizer_settings: Optional[OptimumEmbedderQuantizationConfig] = None
)
Create a OptimumTextEmbedder component.
Arguments:
model
: A string representing the model id on HF Hub.token
: The HuggingFace token to use as HTTP bearer authorization.prefix
: A string to add to the beginning of each text.suffix
: A string to add to the end of each text.normalize_embeddings
: Whether to normalize the embeddings to unit length.onnx_execution_provider
: The execution provider to use for ONNX models.
Note: Using the TensorRT execution provider
TensorRT requires to build its inference engine ahead of inference,
which takes some time due to the model optimization and nodes fusion.
To avoid rebuilding the engine every time the model is loaded, ONNX
Runtime provides a pair of options to save the engine: trt_engine_cache_enable
and trt_engine_cache_path
. We recommend setting these two provider
options using the model_kwargs
parameter, when using the TensorRT execution provider.
The usage is as follows:
embedder = OptimumDocumentEmbedder(
model="sentence-transformers/all-mpnet-base-v2",
onnx_execution_provider="TensorrtExecutionProvider",
model_kwargs={
"provider_options": {
"trt_engine_cache_enable": True,
"trt_engine_cache_path": "tmp/trt_cache",
}
},
)
pooling_mode
: The pooling mode to use. WhenNone
, pooling mode will be inferred from the model config.model_kwargs
: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs overridemodel
,onnx_execution_provider
andtoken
initialization parameters.working_dir
: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.optimizer_settings
: Configuration for Optimum Embedder Optimization. IfNone
, no additional optimization is be applied.quantizer_settings
: Configuration for Optimum Embedder Quantization. IfNone
, no quantization is be applied.
OptimumTextEmbedder.warm_up
def warm_up()
Initializes the component.
OptimumTextEmbedder.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
OptimumTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OptimumTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
OptimumTextEmbedder.run
@component.output_types(embedding=List[float])
def run(text: str)
Embed a string.
Arguments:
text
: The text to embed.
Raises:
RuntimeError
: If the component was not initialized.TypeError
: If the input is not a string.
Returns:
The embeddings of the text.
Module haystack_integrations.components.embedders.optimum.pooling
OptimumEmbedderPooling
Pooling modes support by the Optimum Embedders.
CLS
Perform CLS Pooling on the output of the embedding model using the first token (CLS token).
MEAN
Perform Mean Pooling on the output of the embedding model.
MAX
Perform Max Pooling on the output of the embedding model using the maximum value in each dimension over all the tokens.
MEAN_SQRT_LEN
Perform mean-pooling on the output of the embedding model but divide by the square root of the sequence length.
WEIGHTED_MEAN
Perform weighted (position) mean pooling on the output of the embedding model.
LAST_TOKEN
Perform Last Token Pooling on the output of the embedding model.
OptimumEmbedderPooling.from_str
@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderPooling"
Create a pooling mode from a string.
Arguments:
string
: String to convert.
Returns:
Pooling mode.
Module haystack_integrations.components.embedders.optimum.optimization
OptimumEmbedderOptimizationMode
ONXX Optimization modes support by the Optimum Embedders.
O1
Basic general optimizations.
O2
Basic and extended general optimizations, transformers-specific fusions.
O3
Same as O2 with Gelu approximation.
O4
Same as O3 with mixed precision.
OptimumEmbedderOptimizationMode.from_str
@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderOptimizationMode"
Create an optimization mode from a string.
Arguments:
string
: String to convert.
Returns:
Optimization mode.
OptimumEmbedderOptimizationConfig
Configuration for Optimum Embedder Optimization.
Arguments:
mode
: Optimization mode.for_gpu
: Whether to optimize for GPUs.
OptimumEmbedderOptimizationConfig.to_optimum_config
def to_optimum_config() -> OptimizationConfig
Convert the configuration to a Optimum configuration.
Returns:
Optimum configuration.
OptimumEmbedderOptimizationConfig.to_dict
def to_dict() -> Dict[str, Any]
Convert the configuration to a dictionary.
Returns:
Dictionary with serialized data.
OptimumEmbedderOptimizationConfig.from_dict
@classmethod
def from_dict(cls, data: Dict[str,
Any]) -> "OptimumEmbedderOptimizationConfig"
Create an optimization configuration from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Optimization configuration.
Module haystack_integrations.components.embedders.optimum.quantization
OptimumEmbedderQuantizationMode
Dynamic Quantization modes support by the Optimum Embedders.
ARM64
Quantization for the ARM64 architecture.
AVX2
Quantization with AVX-2 instructions.
AVX512
Quantization with AVX-512 instructions.
AVX512_VNNI
Quantization with AVX-512 and VNNI instructions.
OptimumEmbedderQuantizationMode.from_str
@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderQuantizationMode"
Create an quantization mode from a string.
Arguments:
string
: String to convert.
Returns:
Quantization mode.
OptimumEmbedderQuantizationConfig
Configuration for Optimum Embedder Quantization.
Arguments:
mode
: Quantization mode.per_channel
: Whether to apply per-channel quantization.
OptimumEmbedderQuantizationConfig.to_optimum_config
def to_optimum_config() -> QuantizationConfig
Convert the configuration to a Optimum configuration.
Returns:
Optimum configuration.
OptimumEmbedderQuantizationConfig.to_dict
def to_dict() -> Dict[str, Any]
Convert the configuration to a dictionary.
Returns:
Dictionary with serialized data.
OptimumEmbedderQuantizationConfig.from_dict
@classmethod
def from_dict(cls, data: Dict[str,
Any]) -> "OptimumEmbedderQuantizationConfig"
Create a configuration from a dictionary.
Arguments:
data
: Dictionary to deserialize from.
Returns:
Quantization configuration.