Most common position in a pipeline	After a `PromptBuilder`
Mandatory init variables	"api_key": API key for the NVIDIA NIM. Can be set with `NVIDIA_API_KEY` env var.
Mandatory run variables	“prompt”: A string containing the prompt for the LLM
Output variables	“replies”: A list of strings with all the replies generated by the LLM ”meta”: A list of dictionaries with the metadata associated with each reply, such as token count and others
API reference	Nvidia
GitHub link	https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/nvidia

Overview

The NvidiaGenerator provides an interface for generating text using LLMs self-hosted with NVIDIA NIM or models hosted on the NVIDIA API catalog.

Usage

To start using NvidiaGenerator, first, install the nvidia-haystack package:

pip install nvidia-haystack

You can use the NvidiaGenerator with all the LLMs available in the NVIDIA API catalog or a model deployed with NVIDIA NIM. Follow the NVIDIA NIM for LLMs Playbook to learn how to deploy your desired model on your infrastructure.

On its own

To use LLMs from the NVIDIA API catalog, you need to specify the correct api_url and your API key. You can get your API key directly from the catalog website.

The NvidiaGenerator needs an Nvidia API key to work. It uses the NVIDIA_API_KEY environment variable by default. Otherwise, you can pass an API key at initialization with api_key, as in the following example.

from haystack.utils.auth import Secret
from haystack_integrations.components.generators.nvidia import NvidiaGenerator

generator = NvidiaGenerator(
    model="meta/llama3-70b-instruct",
    api_url="https://integrate.api.nvidia.com/v1",
    api_key=Secret.from_token("<your-api-key>"),
    model_arguments={
        "temperature": 0.2,
        "top_p": 0.7,
        "max_tokens": 1024,
    },
)
generator.warm_up()

result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])

To use a locally deployed model, you need to set the api_url to your localhost and unset your api_key.

from haystack_integrations.components.generators.nvidia import NvidiaGenerator

generator = NvidiaGenerator(
    model="llama-2-7b",
    api_url="http://0.0.0.0:9999/v1",
    api_key=None,
    model_arguments={
        "temperature": 0.2,
    },
)
generator.warm_up()

result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])

In a Pipeline

Here's an example of a RAG pipeline:

from haystack import Pipeline, Document
from haystack.utils.auth import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.nvidia import NvidiaGenerator

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", NvidiaGenerator(
    model="meta/llama3-70b-instruct",
    api_url="https://integrate.api.nvidia.com/v1",
    api_key=Secret.from_token("<your-api-key>"),
    model_arguments={
        "temperature": 0.2,
        "top_p": 0.7,
        "max_tokens": 1024,
    },
))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)

Additional References

🧑‍🍳 Cookbook: Haystack RAG Pipeline with Self-Deployed AI models using NVIDIA NIMs