Module haystack_experimental.components.embedders.image.sentence_transformers_doc_image_embedder

SentenceTransformersDocumentImageEmbedder

A component for computing Document embeddings based on images using Sentence Transformers models.

The embedding of each Document is stored in the embedding field of the Document.

Usage example

from haystack import Document
from haystack_experimental.components.embedders.image import SentenceTransformersDocumentImageEmbedder

embedder = SentenceTransformersDocumentImageEmbedder(model="sentence-transformers/clip-ViT-B-32")
embedder.warm_up()

documents = [
    Document(content="A photo of a cat", meta={"file_path": "cat.jpg"}),
    Document(content="A photo of a dog", meta={"file_path": "dog.jpg"}),
]

result = embedder.run(documents=documents)
documents_with_embeddings = result["documents"]
print(documents_with_embeddings)

# [Document(id=...,
#           content='A photo of a cat',
#           meta={'file_path': 'cat.jpg',
#                 'embedding_source': {'type': 'image', 'file_path_meta_field': 'file_path'}},
#           embedding=vector of size 512),
#  ...]

SentenceTransformersDocumentImageEmbedder.init

def __init__(*,
             file_path_meta_field: str = "file_path",
             root_path: Optional[str] = None,
             model: str = "sentence-transformers/clip-ViT-B-32",
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             batch_size: int = 32,
             progress_bar: bool = True,
             normalize_embeddings: bool = False,
             trust_remote_code: bool = False,
             local_files_only: bool = False,
             model_kwargs: Optional[Dict[str, Any]] = None,
             tokenizer_kwargs: Optional[Dict[str, Any]] = None,
             config_kwargs: Optional[Dict[str, Any]] = None,
             precision: Literal["float32", "int8", "uint8", "binary",
                                "ubinary"] = "float32",
             encode_kwargs: Optional[Dict[str, Any]] = None,
             backend: Literal["torch", "onnx", "openvino"] = "torch") -> None

Creates a SentenceTransformersDocumentEmbedder component.

Arguments:

file_path_meta_field: The metadata field in the Document that contains the file path to the image or PDF.
root_path: The root directory path where document files are located. If provided, file paths in document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
model: The Sentence Transformers model to use for calculating embeddings. Pass a local path or ID of the model on Hugging Face. To be used with this component, the model must be able to embed images and text into the same vector space. Compatible models include:
"sentence-transformers/clip-ViT-B-32"
"sentence-transformers/clip-ViT-L-14"
"sentence-transformers/clip-ViT-B-16"
"sentence-transformers/clip-ViT-B-32-multilingual-v1"
"jinaai/jina-embeddings-v4"
"jinaai/jina-clip-v1"
"jinaai/jina-clip-v2".
device: The device to use for loading the model. Overrides the default device.
token: The API token to download private models from Hugging Face.
batch_size: Number of documents to embed at once.
progress_bar: If True, shows a progress bar when embedding documents.
normalize_embeddings: If True, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
trust_remote_code: If False, allows only Hugging Face verified model architectures. If True, allows custom models and scripts.
local_files_only: If True, does not attempt to download the model from Hugging Face Hub and only looks at local files.
model_kwargs: Additional keyword arguments for AutoModelForSequenceClassification.from_pretrained when loading the model. Refer to specific model documentation for available kwargs.
tokenizer_kwargs: Additional keyword arguments for AutoTokenizer.from_pretrained when loading the tokenizer. Refer to specific model documentation for available kwargs.
config_kwargs: Additional keyword arguments for AutoConfig.from_pretrained when loading the model configuration.
precision: The precision to use for the embeddings. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller and faster to compute, but may have a lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
encode_kwargs: Additional keyword arguments for SentenceTransformer.encode when embedding documents. This parameter is provided for fine customization. Be careful not to clash with already set parameters and avoid passing parameters that change the output type.
backend: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino". Refer to the Sentence Transformers documentation for more information on acceleration and quantization options.

SentenceTransformersDocumentImageEmbedder.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

SentenceTransformersDocumentImageEmbedder.from_dict

@classmethod
def from_dict(
        cls, data: Dict[str,
                        Any]) -> "SentenceTransformersDocumentImageEmbedder"

Deserializes the component from a dictionary.

Arguments:

data: Dictionary to deserialize from.

Returns:

Deserialized component.

SentenceTransformersDocumentImageEmbedder.warm_up

def warm_up() -> None

Initializes the component.

SentenceTransformersDocumentImageEmbedder.run

@component.output_types(documents=List[Document])
def run(documents: List[Document]) -> Dict[str, List[Document]]

Embed a list of documents.

Arguments:

documents: Documents to embed.

Returns:

A dictionary with the following keys:

documents: Documents with embeddings.

Module haystack_experimental.components.embedders.image.sentence_transformers_doc_image_embedder

SentenceTransformersDocumentImageEmbedder

Usage example

SentenceTransformersDocumentImageEmbedder.__init__

SentenceTransformersDocumentImageEmbedder.to_dict

SentenceTransformersDocumentImageEmbedder.from_dict

SentenceTransformersDocumentImageEmbedder.warm_up

SentenceTransformersDocumentImageEmbedder.run

SentenceTransformersDocumentImageEmbedder.init