Most common position in a pipeline	After a ChatPromptBuilder
Mandatory init variables	“api_key”: A Meta Llama API key. Can be set with `LLAMA_API_KEY` env variable or passed to `init()` method.
Mandatory run variables	“messages:” A list of ChatMessage objects
Output variables	“replies”: A list of ChatMessage objects
API reference	Meta Llama API
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama

Overview

The MetaLlamaChatGenerator enables you to use multiple Meta Llama models by making chat completion calls to the Meta Llama API. The default model is Llama-4-Scout-17B-16E-Instruct-FP8.

Currently available models are:

Model ID	Input context length	Output context length	Input Modalities	Output Modalities
`Llama-4-Scout-17B-16E-Instruct-FP8`	128k	4028	Text, Image	Text
`Llama-4-Maverick-17B-128E-Instruct-FP8`	128k	4028	Text, Image	Text
`Llama-3.3-70B-Instruct`	128k	4028	Text	Text
`Llama-3.3-8B-Instruct`	128k	4028	Text	Text

This component uses the same ChatMessage format as other Haystack Chat Generators for structured input and output. For more information, see the ChatMessage documentation.

It is also fully compatible with Haystack Tools and Toolsets that allow function-calling capabilities with supported models.

Initialization

To use this integration, you must have a Meta Llama API key. You can provide it with the LLAMA_API_KEY environment variable or by using a Secret.

Then, install the meta-llama-haystack integration:

pip install meta-llama-haystack

Streaming

MetaLlamaChatGenerator supports streaming responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the streaming_callback parameter during initialization.

Usage

On its own

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator()
response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"][0].text)

With streaming and model routing:

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

llm = MetaLlamaChatGenerator(model="Llama-3.3-8B-Instruct", 
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))

response = llm.run(
    [ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
    )

# check the model used for the response
print("\n\n Model used: ", response["replies"][0].meta["model"])

In a pipeline

# To run this example, you will need to set a `LLAMA_API_KEY` environment variable.

from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret

from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator

# Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
    [
        Document(content="My name is Jean and I live in Paris."),
        Document(content="My name is Mark and I live in Berlin."),
        Document(content="My name is Giorgio and I live in Rome."),
    ]
)

# Build a RAG pipeline
prompt_template = [
    ChatMessage.from_user(
        "Given these documents, answer the question.\n"
        "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
        "Question: {{question}}\n"
        "Answer:"
    )
]

# Define required variables explicitly
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"})

retriever = InMemoryBM25Retriever(document_store=document_store)
llm = MetaLlamaChatGenerator(
    api_key=Secret.from_env_var("LLAMA_API_KEY"),
    streaming_callback=print_streaming_chunk,
)

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")

# Ask a question
question = "Who lives in Paris?"
rag_pipeline.run(
    {
        "retriever": {"query": question},
        "prompt_builder": {"question": question},
    }
)