CohereDocumentImageEmbedder
CohereDocumentImageEmbedder
computes the image embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Cohere embedding models with the ability to embed text and images into the same vector space.
Most common position in a pipeline | Before a DocumentWriter in an indexing pipeline |
Mandatory init variables | "api_key": The Cohere API key. Can be set with COHERE_API_KEY or CO_API_KEY env var. |
Mandatory run variables | "documents": A list of documents, with a meta field containing an image file path |
Output variables | "documents": A list of documents (enriched with embeddings) |
API reference | Cohere |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/cohere |
Overview
CohereDocumentImageEmbedder
expects a list of documents containing an image or a PDF file path in a meta field. The meta field can be specified with the file_path_meta_field
init parameter of this component.
The embedder efficiently loads the images, computes the embeddings using a Cohere model, and stores each of them in the embedding
field of the document.
CohereDocumentImageEmbedder
is commonly used in indexing pipelines. At retrieval time, you need to use the same model with a CohereTextEmbedder
to embed the query, before using an Embedding Retriever.
This component is compatible with Cohere Embed models v3 and later. For a complete list of supported models, see the Cohere documentation.
Installation
To start using this integration with Haystack, install the package with:
pip install cohere-haystack
Authentication
The component uses a COHERE_API_KEY
or CO_API_KEY
environment variable by default. Otherwise, you can pass an API key at initialization with a Secret and Secret.from_token
method:
embedder = CohereTextEmbedder(api_key=Secret.from_token("<your-api-key>"))
To get a Cohere API key, head over to https://cohere.com/.
Usage
On its own
Remember to set COHERE_API_KEY
as an environment variable first.
from haystack import Document
from haystack_integrations.components.embedders.cohere import CohereDocumentImageEmbedder
embedder = CohereDocumentImageEmbedder(model="embed-v4.0")
embedder.warm_up()
documents = [
Document(content="A photo of a cat", meta={"file_path": "cat.jpg"}),
Document(content="A photo of a dog", meta={"file_path": "dog.jpg"}),
]
result = embedder.run(documents=documents)
documents_with_embeddings = result["documents"]
print(documents_with_embeddings)
# [Document(id=...,
# content='A photo of a cat',
# meta={'file_path': 'cat.jpg',
# 'embedding_source': {'type': 'image', 'file_path_meta_field': 'file_path'}},
# embedding=vector of size 1536),
# ...]
In a pipeline
In this example, we can see an indexing pipeline with three components:
ImageFileToDocument
converter that creates empty documents with a reference to an image in themeta.file_path
field;CohereDocumentImageEmbedder
that loads the images, computes embeddings and store them in documents;DocumentWriter
that writes the documents in theInMemoryDocumentStore
.
There is also a multimodal retrieval pipeline, composed of a CohereTextEmbedder
(using the same model as before) and an InMemoryEmbeddingRetriever
.
from haystack import Pipeline
from haystack.components.converters.image import ImageFileToDocument
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.cohere import CohereDocumentImageEmbedder, CohereTextEmbedder
document_store = InMemoryDocumentStore()
# Indexing pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("image_converter", ImageFileToDocument())
indexing_pipeline.add_component(
"embedder",
CohereDocumentImageEmbedder(model="embed-v4.0")
)
indexing_pipeline.add_component(
"writer", DocumentWriter(document_store=document_store)
)
indexing_pipeline.connect("image_converter", "embedder")
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run(data={"image_converter": {"sources": ["dog.jpg", "hyena.jpeg"]}})
# Multimodal retrieval pipeline
retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component(
"embedder",
CohereTextEmbedder(model="embed-v4.0")
)
retrieval_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store, top_k=2)
)
retrieval_pipeline.connect("embedder.embedding", "retriever.query_embedding")
result = retrieval_pipeline.run(data={"text": "man's best friend"})
print(result)
# {
# 'retriever': {
# 'documents': [
# Document(
# id=0c96...,
# meta={
# 'file_path': 'dog.jpg',
# 'embedding_source': {
# 'type': 'image',
# 'file_path_meta_field': 'file_path'
# }
# },
# score=0.288
# ),
# Document(
# id=5e76...,
# meta={
# 'file_path': 'hyena.jpeg',
# 'embedding_source': {
# 'type': 'image',
# 'file_path_meta_field': 'file_path'
# }
# },
# score=0.248
# )
# ]
# }
# }
Additional References
📓 Tutorial: Creating Vision+Text RAG Pipelines
Updated 20 days ago