DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

TransformersImageToText

Use this node to generate captions for images. ImageToText takes paths to images as input and outputs text documents containing the image captions.

TransformersImageToText uses an image-to-text transformers model to generate captions for images. By default, it uses the nlpconnect/vit-gpt2-image-captioning model, but you can replace it with any other image-to-text model. For the list of the latest models, see image-to-text models on Hugging Face.

Position in a PipelineAt the beginning of an indexing pipeline
InputImage URL
OutputDocument with the image URL in the metadata
ClassesTransformersImageToText

Usage

You can use the generate_captions method if you just want to generate captions to a list of images:

from haystack.nodes import TransformersImageToText

image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg", ]

# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)

# Show results (list of Documents containing caption and image file path)
print(documents)

[
	{
		"content": "a red apple is sitting on a pile of hay",
                ...
    "meta": {
               "image_path": "/path/to/images/apple.jpg",
                            ...
                        },
                ...
            },
            ...
        ]

To initialize TransformersImageToText, run:

from haystack.nodes import TransformersImageToText

image_to_text = TransformersImageToText(
	model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
  use_gpu=True,
  batch_size=16,
  progress_bar=True
)

To use the node stand-alone to generate captions for a list of images, run:

# Initialize the node
from haystack.nodes import TransformersImageToText

image_to_text = TransformersImageToText(
	model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
  use_gpu=True,
  batch_size=16,
  progress_bar=True
)

# Specify the paths to the images for which you want to generate captions:
image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg"]

# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)

# Show results (list of Documents containing caption and image file path)
print(documents)

[
	{ #would it be like this or separate content and meta pair for each img?
		"content": "a red apple is sitting on a pile of hay",
    					 "a cat is drinking milk",
    "meta": {
               "image_path": "/path/to/images/apple.jpg",
      				 "image_path": "/path/to/images/cat/jpg"
            }
  }
        ]

To use TransformersImageToText in a pipeline, run:

import os
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes.image_to_text import TransformersImageToText

indexing_pipeline = Pipeline()
image_captioning = TransformersImageToText()
document_store = InMemoryDocumentStore()

indexing_pipeline.add_node(component=image_captioning, name="image_captioning", inputs=["File"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["image_captioning"])

images_to_caption = [doc_dir + "/" + f for f in os.listdir(doc_dir)]

indexing_pipeline.run(file_paths=images_to_caption)