MetaLlamaChatGenerator
This component enables chat completion with any model hosted available with Meta Llama API.
Most common position in a pipeline | After a ChatPromptBuilder |
Mandatory init variables | “api_key”: A Meta Llama API key. Can be set with LLAMA_API_KEY env variable or passed to init() method. |
Mandatory run variables | “messages:” A list of ChatMessage objects |
Output variables | “replies”: A list of ChatMessage objects |
API reference | Meta Llama API |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama |
Overview
The MetaLlamaChatGenerator
enables you to use multiple Meta Llama models by making chat completion calls to the Meta Llama API. The default model is Llama-4-Scout-17B-16E-Instruct-FP8
.
Currently available models are:
Model ID | Input context length | Output context length | Input Modalities | Output Modalities |
---|---|---|---|---|
Llama-4-Scout-17B-16E-Instruct-FP8 | 128k | 4028 | Text, Image | Text |
Llama-4-Maverick-17B-128E-Instruct-FP8 | 128k | 4028 | Text, Image | Text |
Llama-3.3-70B-Instruct | 128k | 4028 | Text | Text |
Llama-3.3-8B-Instruct | 128k | 4028 | Text | Text |
This component uses the same ChatMessage
format as other Haystack Chat Generators for structured input and output. For more information, see the ChatMessage documentation.
It is also fully compatible with Haystack Tools and Toolsets that allow function-calling capabilities with supported models.
Initialization
To use this integration, you must have a Meta Llama API key. You can provide it with the LLAMA_API_KEY
environment variable or by using a Secret.
Then, install the meta-llama-haystack
integration:
pip install meta-llama-haystack
Streaming
MetaLlamaChatGenerator
supports streaming responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the streaming_callback
parameter during initialization.
Usage
On its own
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator
llm = MetaLlamaChatGenerator()
response = llm.run(
[ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
print(response["replies"][0].text)
With streaming and model routing:
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator
llm = MetaLlamaChatGenerator(model="Llama-3.3-8B-Instruct",
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))
response = llm.run(
[ChatMessage.from_user("What are Agentic Pipelines? Be brief.")]
)
# check the model used for the response
print("\n\n Model used: ", response["replies"][0].meta["model"])
In a pipeline
# To run this example, you will need to set a `LLAMA_API_KEY` environment variable.
from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator
# Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
[
Document(content="My name is Jean and I live in Paris."),
Document(content="My name is Mark and I live in Berlin."),
Document(content="My name is Giorgio and I live in Rome."),
]
)
# Build a RAG pipeline
prompt_template = [
ChatMessage.from_user(
"Given these documents, answer the question.\n"
"Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
"Question: {{question}}\n"
"Answer:"
)
]
# Define required variables explicitly
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables={"question", "documents"})
retriever = InMemoryBM25Retriever(document_store=document_store)
llm = MetaLlamaChatGenerator(
api_key=Secret.from_env_var("LLAMA_API_KEY"),
streaming_callback=print_streaming_chunk,
)
rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")
# Ask a question
question = "Who lives in Paris?"
rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
}
)
Updated about 22 hours ago