Version: 2.23

Optimum

Module haystack_integrations.components.embedders.optimum.optimization

OptimumEmbedderOptimizationMode

ONXX Optimization modes support by the Optimum Embedders.

O1

Basic general optimizations.

O2

Basic and extended general optimizations, transformers-specific fusions.

O3

Same as O2 with Gelu approximation.

O4

Same as O3 with mixed precision.

OptimumEmbedderOptimizationMode.from_str

python

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderOptimizationMode"

Create an optimization mode from a string.

Arguments:

string: String to convert.

Returns:

Optimization mode.

OptimumEmbedderOptimizationConfig

Configuration for Optimum Embedder Optimization.

Arguments:

mode: Optimization mode.
for_gpu: Whether to optimize for GPUs.

OptimumEmbedderOptimizationConfig.to_optimum_config

python

def to_optimum_config() -> OptimizationConfig

Convert the configuration to a Optimum configuration.

Returns:

Optimum configuration.

OptimumEmbedderOptimizationConfig.to_dict

python

def to_dict() -> dict[str, Any]

Convert the configuration to a dictionary.

Returns:

Dictionary with serialized data.

OptimumEmbedderOptimizationConfig.from_dict

python

@classmethod
def from_dict(cls, data: dict[str,
                              Any]) -> "OptimumEmbedderOptimizationConfig"

Create an optimization configuration from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Optimization configuration.

Module haystack_integrations.components.embedders.optimum.optimum_document_embedder

OptimumDocumentEmbedder

A component for computing Document embeddings using models loaded with the HuggingFace Optimum library, leveraging the ONNX runtime for high-speed inference.

The embedding of each Document is stored in the embedding field of the Document.

Usage example:

python

from haystack.dataclasses import Document
from haystack_integrations.components.embedders.optimum import OptimumDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = OptimumDocumentEmbedder(model="sentence-transformers/all-mpnet-base-v2")
document_embedder.warm_up()

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

OptimumDocumentEmbedder.init

python

def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
             token: Secret | None = Secret.from_env_var("HF_API_TOKEN",
                                                        strict=False),
             prefix: str = "",
             suffix: str = "",
             normalize_embeddings: bool = True,
             onnx_execution_provider: str = "CPUExecutionProvider",
             pooling_mode: str | OptimumEmbedderPooling | None = None,
             model_kwargs: dict[str, Any] | None = None,
             working_dir: str | None = None,
             optimizer_settings: OptimumEmbedderOptimizationConfig
             | None = None,
             quantizer_settings: OptimumEmbedderQuantizationConfig
             | None = None,
             batch_size: int = 32,
             progress_bar: bool = True,
             meta_fields_to_embed: list[str] | None = None,
             embedding_separator: str = "\n") -> None

Create a OptimumDocumentEmbedder component.

Arguments:

model: A string representing the model id on HF Hub.
token: The HuggingFace token to use as HTTP bearer authorization.
prefix: A string to add to the beginning of each text.
suffix: A string to add to the end of each text.
normalize_embeddings: Whether to normalize the embeddings to unit length.
onnx_execution_provider: The execution provider to use for ONNX models.

Note: Using the TensorRT execution provider TensorRT requires to build its inference engine ahead of inference, which takes some time due to the model optimization and nodes fusion. To avoid rebuilding the engine every time the model is loaded, ONNX Runtime provides a pair of options to save the engine: trt_engine_cache_enable and trt_engine_cache_path. We recommend setting these two provider options using the model_kwargs parameter, when using the TensorRT execution provider. The usage is as follows:

python

embedder = OptimumDocumentEmbedder(
    model="sentence-transformers/all-mpnet-base-v2",
    onnx_execution_provider="TensorrtExecutionProvider",
    model_kwargs={
        "provider_options": {
            "trt_engine_cache_enable": True,
            "trt_engine_cache_path": "tmp/trt_cache",
        }
    },
)

pooling_mode: The pooling mode to use. When None, pooling mode will be inferred from the model config.
model_kwargs: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs override model, onnx_execution_provider and token initialization parameters.
working_dir: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.
optimizer_settings: Configuration for Optimum Embedder Optimization. If None, no additional optimization is be applied.
quantizer_settings: Configuration for Optimum Embedder Quantization. If None, no quantization is be applied.
batch_size: Number of Documents to encode at once.
progress_bar: Whether to show a progress bar or not.
meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
embedding_separator: Separator used to concatenate the meta fields to the Document text.

OptimumDocumentEmbedder.warm_up

python

def warm_up() -> None

Initializes the component.

OptimumDocumentEmbedder.to_dict

python

def to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OptimumDocumentEmbedder.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "OptimumDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: The dictionary to deserialize from.

Returns:

The deserialized component.

OptimumDocumentEmbedder.run

python

@component.output_types(documents=list[Document])
def run(documents: list[Document]) -> dict[str, list[Document]]

Embed a list of Documents.

The embedding of each Document is stored in the embedding field of the Document.

Arguments:

documents: A list of Documents to embed.

Raises:

TypeError: If the input is not a list of Documents.

Returns:

The updated Documents with their embeddings.

Module haystack_integrations.components.embedders.optimum.optimum_text_embedder

OptimumTextEmbedder

A component to embed text using models loaded with the HuggingFace Optimum library, leveraging the ONNX runtime for high-speed inference.

Usage example:

python

from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

OptimumTextEmbedder.init

python

def __init__(
        model: str = "sentence-transformers/all-mpnet-base-v2",
        token: Secret | None = Secret.from_env_var("HF_API_TOKEN",
                                                   strict=False),
        prefix: str = "",
        suffix: str = "",
        normalize_embeddings: bool = True,
        onnx_execution_provider: str = "CPUExecutionProvider",
        pooling_mode: str | OptimumEmbedderPooling | None = None,
        model_kwargs: dict[str, Any] | None = None,
        working_dir: str | None = None,
        optimizer_settings: OptimumEmbedderOptimizationConfig | None = None,
        quantizer_settings: OptimumEmbedderQuantizationConfig | None = None)

Create a OptimumTextEmbedder component.

Arguments:

model: A string representing the model id on HF Hub.
token: The HuggingFace token to use as HTTP bearer authorization.
prefix: A string to add to the beginning of each text.
suffix: A string to add to the end of each text.
normalize_embeddings: Whether to normalize the embeddings to unit length.
onnx_execution_provider: The execution provider to use for ONNX models.

python

embedder = OptimumDocumentEmbedder(
    model="sentence-transformers/all-mpnet-base-v2",
    onnx_execution_provider="TensorrtExecutionProvider",
    model_kwargs={
        "provider_options": {
            "trt_engine_cache_enable": True,
            "trt_engine_cache_path": "tmp/trt_cache",
        }
    },
)

pooling_mode: The pooling mode to use. When None, pooling mode will be inferred from the model config.
model_kwargs: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs override model, onnx_execution_provider and token initialization parameters.
working_dir: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.
optimizer_settings: Configuration for Optimum Embedder Optimization. If None, no additional optimization is be applied.
quantizer_settings: Configuration for Optimum Embedder Quantization. If None, no quantization is be applied.

OptimumTextEmbedder.warm_up

python

def warm_up()

Initializes the component.

OptimumTextEmbedder.to_dict

python

def to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OptimumTextEmbedder.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "OptimumTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: The dictionary to deserialize from.

Returns:

The deserialized component.

OptimumTextEmbedder.run

python

@component.output_types(embedding=list[float])
def run(text: str) -> dict[str, list[float]]

Embed a string.

Arguments:

text: The text to embed.

Raises:

TypeError: If the input is not a string.

Returns:

The embeddings of the text.

Module haystack_integrations.components.embedders.optimum.pooling

OptimumEmbedderPooling

Pooling modes support by the Optimum Embedders.

CLS

Perform CLS Pooling on the output of the embedding model using the first token (CLS token).

MEAN

Perform Mean Pooling on the output of the embedding model.

MAX

Perform Max Pooling on the output of the embedding model using the maximum value in each dimension over all the tokens.

MEAN_SQRT_LEN

Perform mean-pooling on the output of the embedding model but divide by the square root of the sequence length.

WEIGHTED_MEAN

Perform weighted (position) mean pooling on the output of the embedding model.

LAST_TOKEN

Perform Last Token Pooling on the output of the embedding model.

OptimumEmbedderPooling.from_str

python

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderPooling"

Create a pooling mode from a string.

Arguments:

string: String to convert.

Returns:

Pooling mode.

Module haystack_integrations.components.embedders.optimum.quantization

OptimumEmbedderQuantizationMode

Dynamic Quantization modes support by the Optimum Embedders.

ARM64

Quantization for the ARM64 architecture.

AVX2

Quantization with AVX-2 instructions.

AVX512

Quantization with AVX-512 instructions.

AVX512_VNNI

Quantization with AVX-512 and VNNI instructions.

OptimumEmbedderQuantizationMode.from_str

python

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderQuantizationMode"

Create an quantization mode from a string.

Arguments:

string: String to convert.

Returns:

Quantization mode.

OptimumEmbedderQuantizationConfig

Configuration for Optimum Embedder Quantization.

Arguments:

mode: Quantization mode.
per_channel: Whether to apply per-channel quantization.

OptimumEmbedderQuantizationConfig.to_optimum_config

python

def to_optimum_config() -> QuantizationConfig

Convert the configuration to a Optimum configuration.

Returns:

Optimum configuration.

OptimumEmbedderQuantizationConfig.to_dict

python

def to_dict() -> dict[str, Any]

Convert the configuration to a dictionary.

Returns:

Dictionary with serialized data.

OptimumEmbedderQuantizationConfig.from_dict

python

@classmethod
def from_dict(cls, data: dict[str,
                              Any]) -> "OptimumEmbedderQuantizationConfig"

Create a configuration from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Quantization configuration.

Module haystack_integrations.components.embedders.optimum.optimization​

OptimumEmbedderOptimizationMode​

O1​

O2​

O3​

O4​

OptimumEmbedderOptimizationMode.from_str​

OptimumEmbedderOptimizationConfig​

OptimumEmbedderOptimizationConfig.to_optimum_config​

OptimumEmbedderOptimizationConfig.to_dict​

OptimumEmbedderOptimizationConfig.from_dict​

Module haystack_integrations.components.embedders.optimum.optimum_document_embedder​

OptimumDocumentEmbedder​

OptimumDocumentEmbedder.__init__​

OptimumDocumentEmbedder.warm_up​

OptimumDocumentEmbedder.to_dict​

OptimumDocumentEmbedder.from_dict​

OptimumDocumentEmbedder.run​

Module haystack_integrations.components.embedders.optimum.optimum_text_embedder​

OptimumTextEmbedder​

OptimumTextEmbedder.__init__​

OptimumTextEmbedder.warm_up​

OptimumTextEmbedder.to_dict​

OptimumTextEmbedder.from_dict​

OptimumTextEmbedder.run​

Module haystack_integrations.components.embedders.optimum.pooling​

OptimumEmbedderPooling​

CLS​

MEAN​

MAX​

MEAN_SQRT_LEN​

WEIGHTED_MEAN​

LAST_TOKEN​

OptimumEmbedderPooling.from_str​

Module haystack_integrations.components.embedders.optimum.quantization​

OptimumEmbedderQuantizationMode​

ARM64​

AVX2​

AVX512​

AVX512_VNNI​

OptimumEmbedderQuantizationMode.from_str​

OptimumEmbedderQuantizationConfig​

OptimumEmbedderQuantizationConfig.to_optimum_config​

OptimumEmbedderQuantizationConfig.to_dict​

OptimumEmbedderQuantizationConfig.from_dict​

Module haystack_integrations.components.embedders.optimum.optimization

OptimumEmbedderOptimizationMode

O1

O2

O3

O4

OptimumEmbedderOptimizationMode.from_str

OptimumEmbedderOptimizationConfig

OptimumEmbedderOptimizationConfig.to_optimum_config

OptimumEmbedderOptimizationConfig.to_dict

OptimumEmbedderOptimizationConfig.from_dict

Module haystack_integrations.components.embedders.optimum.optimum_document_embedder

OptimumDocumentEmbedder

OptimumDocumentEmbedder.init

OptimumDocumentEmbedder.warm_up

OptimumDocumentEmbedder.to_dict

OptimumDocumentEmbedder.from_dict

OptimumDocumentEmbedder.run

Module haystack_integrations.components.embedders.optimum.optimum_text_embedder

OptimumTextEmbedder

OptimumTextEmbedder.init

OptimumTextEmbedder.warm_up

OptimumTextEmbedder.to_dict

OptimumTextEmbedder.from_dict

OptimumTextEmbedder.run

Module haystack_integrations.components.embedders.optimum.pooling

OptimumEmbedderPooling

CLS

MEAN

MAX

MEAN_SQRT_LEN

WEIGHTED_MEAN

LAST_TOKEN

OptimumEmbedderPooling.from_str

Module haystack_integrations.components.embedders.optimum.quantization

OptimumEmbedderQuantizationMode

ARM64

AVX2

AVX512

AVX512_VNNI

OptimumEmbedderQuantizationMode.from_str

OptimumEmbedderQuantizationConfig

OptimumEmbedderQuantizationConfig.to_optimum_config

OptimumEmbedderQuantizationConfig.to_dict

OptimumEmbedderQuantizationConfig.from_dict