Most common position in a pipeline	After a ChatPromptBuilder
Mandatory init variables	"api_key": API key for the NVIDIA NIM. Can be set with `NVIDIA_API_KEY` env var.
Mandatory run variables	"messages": A list of ChatMessage objects
Output variables	"replies": A list of ChatMessage objects
API reference	NVIDIA API
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/nvidia

Overview

NvidiaChatGenerator enables chat completions using NVIDIA's generative models via the NVIDIA API. It is compatible with the ChatMessage format for both input and output, ensuring seamless integration in chat-based pipelines.

You can use LLMs self-hosted with NVIDIA NIM or models hosted on the NVIDIA API catalog. The default model for this component is meta/llama-3.1-8b-instruct.

To use this integration, you must have a NVIDIA API key. You can provide it with the NVIDIA_API_KEY environment variable or by using a Secret.

This generator supports streaming responses from the LLM. To enable streaming, pass a callable to the streaming_callback parameter during initialization.

Usage

To start using NvidiaChatGenerator, first, install the nvidia-haystack package:

pip install nvidia-haystack

You can use the NvidiaChatGenerator with all the LLMs available in the NVIDIA API catalog or a model deployed with NVIDIA NIM. Follow the NVIDIA NIM for LLMs Playbook to learn how to deploy your desired model on your infrastructure.

On its own

To use LLMs from the NVIDIA API catalog, you need to specify the correct api_url if needed (the default one is https://integrate.api.nvidia.com/v1), and your API key. You can get your API key directly from the catalog website.

from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
from haystack.dataclasses import ChatMessage

generator = NvidiaChatGenerator(
    model="meta/llama-3.1-8b-instruct",  # or any supported NVIDIA model
    api_key=Secret.from_env_var("NVIDIA_API_KEY")
)

messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
result = generator.run(messages)
print(result["replies"])
print(result["meta"])

In a Pipeline

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
from haystack.utils import Secret

pipe = Pipeline()
pipe.add_component("prompt_builder", ChatPromptBuilder())
pipe.add_component("llm", NvidiaChatGenerator(
    model="meta/llama-3.1-8b-instruct",
    api_key=Secret.from_env_var("NVIDIA_API_KEY")
))
pipe.connect("prompt_builder", "llm")

country = "Germany"
system_message = ChatMessage.from_system("You are an assistant giving out valuable information to language learners.")
messages = [system_message, ChatMessage.from_user("What's the official language of {{ country }}?")]

res = pipe.run(data={"prompt_builder": {"template_variables": {"country": country}, "template": messages}})
print(res)