DocumentationAPI ReferenceπŸ““ TutorialsπŸ§‘β€πŸ³ Cookbook🀝 IntegrationsπŸ’œ Discord

Optimum integration for Haystack

Module haystack_integrations.components.embedders.optimum.optimum_document_embedder

OptimumDocumentEmbedder

A component for computing Document embeddings using models loaded with the HuggingFace Optimum library, leveraging the ONNX runtime for high-speed inference.

The embedding of each Document is stored in the embedding field of the Document.

Usage example:

from haystack.dataclasses import Document
from haystack_integrations.components.embedders.optimum import OptimumDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = OptimumDocumentEmbedder(model="sentence-transformers/all-mpnet-base-v2")
document_embedder.warm_up()

result = document_embedder.run([doc])
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

OptimumDocumentEmbedder.__init__

def __init__(
        model: str = "sentence-transformers/all-mpnet-base-v2",
        token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                      strict=False),
        prefix: str = "",
        suffix: str = "",
        normalize_embeddings: bool = True,
        onnx_execution_provider: str = "CPUExecutionProvider",
        pooling_mode: Optional[Union[str, OptimumEmbedderPooling]] = None,
        model_kwargs: Optional[Dict[str, Any]] = None,
        working_dir: Optional[str] = None,
        optimizer_settings: Optional[OptimumEmbedderOptimizationConfig] = None,
        quantizer_settings: Optional[OptimumEmbedderQuantizationConfig] = None,
        batch_size: int = 32,
        progress_bar: bool = True,
        meta_fields_to_embed: Optional[List[str]] = None,
        embedding_separator: str = "\n")

Create a OptimumDocumentEmbedder component.

Arguments:

  • model: A string representing the model id on HF Hub.
  • token: The HuggingFace token to use as HTTP bearer authorization.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • normalize_embeddings: Whether to normalize the embeddings to unit length.
  • onnx_execution_provider: The execution provider to use for ONNX models.

Note: Using the TensorRT execution provider TensorRT requires to build its inference engine ahead of inference, which takes some time due to the model optimization and nodes fusion. To avoid rebuilding the engine every time the model is loaded, ONNX Runtime provides a pair of options to save the engine: trt_engine_cache_enable and trt_engine_cache_path. We recommend setting these two provider options using the model_kwargs parameter, when using the TensorRT execution provider. The usage is as follows:

embedder = OptimumDocumentEmbedder(
    model="sentence-transformers/all-mpnet-base-v2",
    onnx_execution_provider="TensorrtExecutionProvider",
    model_kwargs={
        "provider_options": {
            "trt_engine_cache_enable": True,
            "trt_engine_cache_path": "tmp/trt_cache",
        }
    },
)
  • pooling_mode: The pooling mode to use. When None, pooling mode will be inferred from the model config.
  • model_kwargs: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs override model, onnx_execution_provider and token initialization parameters.
  • working_dir: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.
  • optimizer_settings: Configuration for Optimum Embedder Optimization. If None, no additional optimization is be applied.
  • quantizer_settings: Configuration for Optimum Embedder Quantization. If None, no quantization is be applied.
  • batch_size: Number of Documents to encode at once.
  • progress_bar: Whether to show a progress bar or not.
  • meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.
  • embedding_separator: Separator used to concatenate the meta fields to the Document text.

OptimumDocumentEmbedder.warm_up

def warm_up()

Initializes the component.

OptimumDocumentEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OptimumDocumentEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OptimumDocumentEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

OptimumDocumentEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document])

Embed a list of Documents.

The embedding of each Document is stored in the embedding field of the Document.

Arguments:

  • documents: A list of Documents to embed.

Raises:

  • RuntimeError: If the component was not initialized.
  • TypeError: If the input is not a list of Documents.

Returns:

The updated Documents with their embeddings.

Module haystack_integrations.components.embedders.optimum.optimum_text_embedder

OptimumTextEmbedder

A component to embed text using models loaded with the HuggingFace Optimum library, leveraging the ONNX runtime for high-speed inference.

Usage example:

from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

OptimumTextEmbedder.__init__

def __init__(
        model: str = "sentence-transformers/all-mpnet-base-v2",
        token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
                                                      strict=False),
        prefix: str = "",
        suffix: str = "",
        normalize_embeddings: bool = True,
        onnx_execution_provider: str = "CPUExecutionProvider",
        pooling_mode: Optional[Union[str, OptimumEmbedderPooling]] = None,
        model_kwargs: Optional[Dict[str, Any]] = None,
        working_dir: Optional[str] = None,
        optimizer_settings: Optional[OptimumEmbedderOptimizationConfig] = None,
        quantizer_settings: Optional[OptimumEmbedderQuantizationConfig] = None
)

Create a OptimumTextEmbedder component.

Arguments:

  • model: A string representing the model id on HF Hub.
  • token: The HuggingFace token to use as HTTP bearer authorization.
  • prefix: A string to add to the beginning of each text.
  • suffix: A string to add to the end of each text.
  • normalize_embeddings: Whether to normalize the embeddings to unit length.
  • onnx_execution_provider: The execution provider to use for ONNX models.

Note: Using the TensorRT execution provider TensorRT requires to build its inference engine ahead of inference, which takes some time due to the model optimization and nodes fusion. To avoid rebuilding the engine every time the model is loaded, ONNX Runtime provides a pair of options to save the engine: trt_engine_cache_enable and trt_engine_cache_path. We recommend setting these two provider options using the model_kwargs parameter, when using the TensorRT execution provider. The usage is as follows:

embedder = OptimumDocumentEmbedder(
    model="sentence-transformers/all-mpnet-base-v2",
    onnx_execution_provider="TensorrtExecutionProvider",
    model_kwargs={
        "provider_options": {
            "trt_engine_cache_enable": True,
            "trt_engine_cache_path": "tmp/trt_cache",
        }
    },
)
  • pooling_mode: The pooling mode to use. When None, pooling mode will be inferred from the model config.
  • model_kwargs: Dictionary containing additional keyword arguments to pass to the model. In case of duplication, these kwargs override model, onnx_execution_provider and token initialization parameters.
  • working_dir: The directory to use for storing intermediate files generated during model optimization/quantization. Required for optimization and quantization.
  • optimizer_settings: Configuration for Optimum Embedder Optimization. If None, no additional optimization is be applied.
  • quantizer_settings: Configuration for Optimum Embedder Quantization. If None, no quantization is be applied.

OptimumTextEmbedder.warm_up

def warm_up()

Initializes the component.

OptimumTextEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

OptimumTextEmbedder.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OptimumTextEmbedder"

Deserializes the component from a dictionary.

Arguments:

  • data: The dictionary to deserialize from.

Returns:

The deserialized component.

OptimumTextEmbedder.run

@component.output_types(embedding=List[float])
def run(text: str)

Embed a string.

Arguments:

  • text: The text to embed.

Raises:

  • RuntimeError: If the component was not initialized.
  • TypeError: If the input is not a string.

Returns:

The embeddings of the text.

Module haystack_integrations.components.embedders.optimum.pooling

OptimumEmbedderPooling

Pooling modes support by the Optimum Embedders.

CLS

Perform CLS Pooling on the output of the embedding model using the first token (CLS token).

MEAN

Perform Mean Pooling on the output of the embedding model.

MAX

Perform Max Pooling on the output of the embedding model using the maximum value in each dimension over all the tokens.

MEAN_SQRT_LEN

Perform mean-pooling on the output of the embedding model but divide by the square root of the sequence length.

WEIGHTED_MEAN

Perform weighted (position) mean pooling on the output of the embedding model.

LAST_TOKEN

Perform Last Token Pooling on the output of the embedding model.

OptimumEmbedderPooling.from_str

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderPooling"

Create a pooling mode from a string.

Arguments:

  • string: String to convert.

Returns:

Pooling mode.

Module haystack_integrations.components.embedders.optimum.optimization

OptimumEmbedderOptimizationMode

ONXX Optimization modes support by the Optimum Embedders.

O1

Basic general optimizations.

O2

Basic and extended general optimizations, transformers-specific fusions.

O3

Same as O2 with Gelu approximation.

O4

Same as O3 with mixed precision.

OptimumEmbedderOptimizationMode.from_str

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderOptimizationMode"

Create an optimization mode from a string.

Arguments:

  • string: String to convert.

Returns:

Optimization mode.

OptimumEmbedderOptimizationConfig

Configuration for Optimum Embedder Optimization.

Arguments:

  • mode: Optimization mode.
  • for_gpu: Whether to optimize for GPUs.

OptimumEmbedderOptimizationConfig.to_optimum_config

def to_optimum_config() -> OptimizationConfig

Convert the configuration to a Optimum configuration.

Returns:

Optimum configuration.

OptimumEmbedderOptimizationConfig.to_dict

def to_dict() -> Dict[str, Any]

Convert the configuration to a dictionary.

Returns:

Dictionary with serialized data.

OptimumEmbedderOptimizationConfig.from_dict

@classmethod
def from_dict(cls, data: Dict[str,
                              Any]) -> "OptimumEmbedderOptimizationConfig"

Create an optimization configuration from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Optimization configuration.

Module haystack_integrations.components.embedders.optimum.quantization

OptimumEmbedderQuantizationMode

Dynamic Quantization modes support by the Optimum Embedders.

ARM64

Quantization for the ARM64 architecture.

AVX2

Quantization with AVX-2 instructions.

AVX512

Quantization with AVX-512 instructions.

AVX512_VNNI

Quantization with AVX-512 and VNNI instructions.

OptimumEmbedderQuantizationMode.from_str

@classmethod
def from_str(cls, string: str) -> "OptimumEmbedderQuantizationMode"

Create an quantization mode from a string.

Arguments:

  • string: String to convert.

Returns:

Quantization mode.

OptimumEmbedderQuantizationConfig

Configuration for Optimum Embedder Quantization.

Arguments:

  • mode: Quantization mode.
  • per_channel: Whether to apply per-channel quantization.

OptimumEmbedderQuantizationConfig.to_optimum_config

def to_optimum_config() -> QuantizationConfig

Convert the configuration to a Optimum configuration.

Returns:

Optimum configuration.

OptimumEmbedderQuantizationConfig.to_dict

def to_dict() -> Dict[str, Any]

Convert the configuration to a dictionary.

Returns:

Dictionary with serialized data.

OptimumEmbedderQuantizationConfig.from_dict

@classmethod
def from_dict(cls, data: Dict[str,
                              Any]) -> "OptimumEmbedderQuantizationConfig"

Create a configuration from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Quantization configuration.