TransformersImageToText
Use this node to generate captions for images. ImageToText takes paths to images as input and outputs text documents containing the image captions.
TransformersImageToText uses an image-to-text transformers model to generate captions for images. By default, it uses the nlpconnect/vit-gpt2-image-captioning model, but you can replace it with any other image-to-text model. For the list of the latest models, see image-to-text models on Hugging Face.
Position in a Pipeline | At the beginning of an indexing pipeline |
Input | Image URL |
Output | Document with the image URL in the metadata |
Classes | TransformersImageToText |
Usage
You can use the generate_captions
method if you just want to generate captions to a list of images:
from haystack.nodes import TransformersImageToText
image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg", ]
# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)
# Show results (list of Documents containing caption and image file path)
print(documents)
[
{
"content": "a red apple is sitting on a pile of hay",
...
"meta": {
"image_path": "/path/to/images/apple.jpg",
...
},
...
},
...
]
To initialize TransformersImageToText, run:
from haystack.nodes import TransformersImageToText
image_to_text = TransformersImageToText(
model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
use_gpu=True,
batch_size=16,
progress_bar=True
)
To use the node stand-alone to generate captions for a list of images, run:
# Initialize the node
from haystack.nodes import TransformersImageToText
image_to_text = TransformersImageToText(
model_name_or_path="nlpconnect/vit-gpt2-image-captioning",
use_gpu=True,
batch_size=16,
progress_bar=True
)
# Specify the paths to the images for which you want to generate captions:
image_file_paths = ["/path/to/images/apple.jpg","/path/to/images/cat.jpg"]
# Generate captions
documents = image_to_text.generate_captions(image_file_paths=image_file_paths)
# Show results (list of Documents containing caption and image file path)
print(documents)
[
{ #would it be like this or separate content and meta pair for each img?
"content": "a red apple is sitting on a pile of hay",
"a cat is drinking milk",
"meta": {
"image_path": "/path/to/images/apple.jpg",
"image_path": "/path/to/images/cat/jpg"
}
}
]
To use TransformersImageToText in a pipeline, run:
import os
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes.image_to_text import TransformersImageToText
indexing_pipeline = Pipeline()
image_captioning = TransformersImageToText()
document_store = InMemoryDocumentStore()
indexing_pipeline.add_node(component=image_captioning, name="image_captioning", inputs=["File"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["image_captioning"])
images_to_caption = [doc_dir + "/" + f for f in os.listdir(doc_dir)]
indexing_pipeline.run(file_paths=images_to_caption)
Updated almost 2 years ago
Related Links