DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

TransformersImageToText

Use this node to generate captions for images. ImageToText takes paths to images as input and outputs text documents containing the image captions.

TransformersImageToText uses an image-to-text transformers model to generate captions for images. By default, it uses the nlpconnect/vit-gpt2-image-captioning model, but you can replace it with any other image-to-text model. For the list of the latest models, see image-to-text models on Hugging Face.

Position in a PipelineAt the beginning of an indexing pipeline
InputImage URL
OutputDocument with the image URL in the metadata
ClassesTransformersImageToText

Usage

You can use the generate_captions method if you just want to generate captions to a list of images:

from haystack.nodes import TransformersImageToText

image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg", ]

# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)

# Show results (list of Documents containing caption and image file path)
print(documents)

[
	{
		"content": "a red apple is sitting on a pile of hay",
                ...
    "meta": {
               "image_path": "/path/to/images/apple.jpg",
                            ...
                        },
                ...
            },
            ...
        ]

To initialize TransformersImageToText, run:

from haystack.nodes import TransformersImageToText

image_to_text = TransformersImageToText(
	model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
  use_gpu=True,
  batch_size=16,
  progress_bar=True
)

To use the node stand-alone to generate captions for a list of images, run:

# Initialize the node
from haystack.nodes import TransformersImageToText

image_to_text = TransformersImageToText(
	model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
  use_gpu=True,
  batch_size=16,
  progress_bar=True
)

# Specify the paths to the images for which you want to generate captions:
image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg"]

# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)

# Show results (list of Documents containing caption and image file path)
print(documents)

[
	{ #would it be like this or separate content and meta pair for each img?
		"content": "a red apple is sitting on a pile of hay",
    					 "a cat is drinking milk",
    "meta": {
               "image_path": "/path/to/images/apple.jpg",
      				 "image_path": "/path/to/images/cat/jpg"
            }
  }
        ]

To use TransformersImageToText in a pipeline, run:

import os
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes.image_to_text import TransformersImageToText

indexing_pipeline = Pipeline()
image_captioning = TransformersImageToText()
document_store = InMemoryDocumentStore()

indexing_pipeline.add_node(component=image_captioning, name="image_captioning", inputs=["File"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["image_captioning"])

images_to_caption = [doc_dir + "/" + f for f in os.listdir(doc_dir)]

indexing_pipeline.run(file_paths=images_to_caption)