DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

VertexAIImageCaptioner

VertexAIImageCaptioner enables text generation using Google Vertex AI imagetext generative model.

NameVertexAIImageCaptioner
Sourcehttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/google_vertex
Mandatory input variables“image”: A ByteStream object storing an image
Output variables“captions”: A list of strings generated by the model

Parameters Overview

VertexAIImageCaptioner uses Google Cloud Application Default Credentials (ADCs) for authentication. For more information on how to set up ADCs, see the official documentation.

Keep in mind that it’s essential to use an account that has access to a project authorized to use Google Vertex AI endpoints.

You can find your project ID in the GCP resource manager or locally by running gcloud projects list in your terminal. For more info on the gcloud CLI, see its official documentation.

Usage

You need to install google-vertex-haystack package to use the VertexAIImageCaptioner:

pip install google-vertex-haystack

On its own

Basic usage:

import requests

from haystack.dataclasses.byte_stream import ByteStream
from haystack_integrations.components.generators.google_vertex import VertexAIImageCaptioner


captioner = VertexAIImageCaptioner(project_id=project_id)

image = ByteStream(data=requests.get("https://raw.githubusercontent.com/silvanocerza/robots/main/robot1.jpg").content)
result = captioner.run(image=image)

for caption in result["captions"]:
    print(caption)

>>> two gold robots are standing next to each other in the desert

You can also set the caption language and the number of results:

import requests

from haystack.dataclasses.byte_stream import ByteStream
from haystack_integrations.components.generators.google_vertex import VertexAIImageCaptioner


captioner = VertexAIImageCaptioner(
	project_id=project_id,
	number_of_results=3, # Can't be greater than 3
	language="it",
)

image = ByteStream(data=requests.get("https://raw.githubusercontent.com/silvanocerza/robots/main/robot1.jpg").content)
result = captioner.run(image=image)

for caption in result["captions"]:
    print(caption)

>>> due robot dorati sono in piedi uno accanto all'altro in un deserto
>>> un c3p0 e un r2d2 stanno in piedi uno accanto all'altro in un deserto
>>> due robot dorati sono in piedi uno accanto all'altro

Related Links

Check out the API reference in the GitHub repo or in our docs: