DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

PDFToImageContent

PDFToImageContent reads local PDF files and converts them into ImageContent objects. These are ready for multimodal AI pipelines, including tasks like image captioning, visual QA, or prompt-based generation.

Most common position in a pipelineBefore a ChatPromptBuilder in a query pipeline
Mandatory run variables"sources": A list of PDF file paths or ByteStreams
Output variables"image_contents": A list of ImageContent objects
API referenceImage Converters
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/pdf_to_image.py

Overview

PDFToImageContent processes a list of PDF sources and converts them into ImageContent objects, one for each page of the PDF. These can be used in multimodal pipelines that require base64-encoded image input.

Each source can be:

  • A file path (string or Path), or
  • A ByteStream object.

Optionally, you can provide metadata using the meta parameter. This can be a single dictionary (applied to all images) or a list matching the length of sources.

Use the size parameter to resize images while preserving aspect ratio. This reduces memory usage and transmission size, which is helpful when working with remote models or limited-resource environments.

This component is often used in query pipelines just before a ChatPromptBuilder.

Usage

On its own

from haystack.components.converters.image import PDFToImageContent

converter = PDFToImageContent()

sources = ["file.pdf", "another_file.pdf"]

image_contents = converter.run(sources=sources)["image_contents"]
print(image_contents)

# [ImageContent(base64_image='...',
#               mime_type='application/pdf',
#               detail=None,
#               meta={'file_path': 'file.pdf', 'page_number': 1}),
#  ...]

In a pipeline

Use ImageFileToImageContent to supply image data to a ChatPromptBuilder for multimodal QA or captioning with an LLM.

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.converters.image import PDFToImageContent

# Query pipeline
pipeline = Pipeline()
pipeline.add_component("image_converter", PDFToImageContent(detail="auto"))
pipeline.add_component(
    "chat_prompt_builder",
    ChatPromptBuilder(
        required_variables=["question"],
        template="""{% message role="system" %}
You are a helpful assistant that answers questions using the provided images.
{% endmessage %}

{% message role="user" %}
Question: {{ question }}

{% for img in image_contents %}
{{ img | templatize_part }}
{% endfor %}
{% endmessage %}
"""
    )
)
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini"))

pipeline.connect("image_converter", "chat_prompt_builder.image_contents")
pipeline.connect("chat_prompt_builder", "llm")

sources = ["flan_paper.pdf"]

result = pipeline.run(
    data={
        "image_converter": {"sources": ["flan_paper.pdf"], "page_range":"9"},
        "chat_prompt_builder": {"question": "What is the main takeaway of Figure 6?"}
    }
)
print(result["replies"][0].text)

# ('The main takeaway of Figure 6 is that Flan-PaLM demonstrates improved '
# 'performance in zero-shot reasoning tasks when utilizing chain-of-thought '
# '(CoT) reasoning, as indicated by higher accuracy across different model '
# 'sizes compared to PaLM without finetuning. This highlights the importance of '
# 'instruction finetuning combined with CoT for enhancing reasoning '
# 'capabilities in models.')

Additional References

🧑‍🍳 Cookbook: Introduction to Multimodality