DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

VertexAIDocumentEmbedder

This component computes embeddings for documents using models through VertexAI Embeddings API.

Most common position in a pipelineBefore a DocumentWriter in an indexing pipeline
Mandatory init variables"model": The model used through the VertexAI Embeddings API
Mandatory run variables“documents”: A list of documents to be embedded
Output variables“documents”: A list of documents enriched with embeddings
API referenceGoogle Vertex
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/google_vertex

VertexAIDocumentEmbedder enriches the metadata of documents with an embedding of their content. To embed a string, use the VertexAITextEmbedder.

To use the VertexAIDocumentEmbedder, initialize it with:

  • model: The supported models are:
    • "text-embedding-004"
    • "text-embedding-005"
    • "textembedding-gecko-multilingual@001"
    • "text-multilingual-embedding-002"
    • "text-embedding-large-exp-03-07"
  • task_type: "RETRIEVAL_DOCUMENT” is the default. You can find all task types in the official Google documentation.

Authentication

VertexAIDocumentEmbedder uses Google Cloud Application Default Credentials (ADCs) for authentication. For more information on how to set up ADCs, see the official documentation.

Keep in mind that it’s essential to use an account that has access to a project authorized to use Google Vertex AI endpoints.

You can find your project ID in the GCP resource manager or locally by running gcloud projects list in your terminal. For more info on the gcloud CLI, see its official documentation.

Usage

Install the google-vertex-haystack package to use this Embedder:

pip install google-vertex-haystack

On its own

from haystack import Document
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder

doc = Document(content="I love pizza!")

document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")

result = document_embedder.run([doc])
print(result['documents'][0].embedding)
# [-0.044606007635593414, 0.02857724390923977, -0.03549133986234665,

In a pipeline

from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.google_vertex import VertexAITextEmbedder
from haystack_integrations.components.embedders.google_vertex import VertexAIDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

document_embedder = VertexAIDocumentEmbedder(model="text-embedding-005")
documents_with_embeddings = document_embedder.run(documents)['documents']
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", VertexAITextEmbedder(model="text-embedding-005"))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder":{"text": query}})

print(result['retriever']['documents'][0])

# Document(id=..., content: 'My name is Wolfgang and I live in Berlin')