NvidiaGenerator
This Generator enables text generation using Nvidia-hosted models.
Most common position in a pipeline | After a PromptBuilder |
Mandatory init variables | "api_key": API key for the NVIDIA NIM. Can be set with NVIDIA_API_KEY env var. |
Mandatory run variables | “prompt”: A string containing the prompt for the LLM |
Output variables | “replies”: A list of strings with all the replies generated by the LLM ”meta”: A list of dictionaries with the metadata associated with each reply, such as token count and others |
API reference | Nvidia |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/nvidia |
Overview
The NvidiaGenerator
provides an interface for generating text using LLMs self-hosted with NVIDIA NIM or models hosted on the NVIDIA API catalog.
Usage
To start using NvidiaGenerator
, first, install the nvidia-haystack
package:
pip install nvidia-haystack
You can use the NvidiaGenerator
with all the LLMs available in the NVIDIA API catalog or a model deployed with NVIDIA NIM. Follow the NVIDIA NIM for LLMs Playbook to learn how to deploy your desired model on your infrastructure.
On its own
To use LLMs from the NVIDIA API catalog, you need to specify the correct api_url
and your API key. You can get your API key directly from the catalog website.
The NvidiaGenerator
needs an Nvidia API key to work. It uses the NVIDIA_API_KEY
environment variable by default. Otherwise, you can pass an API key at initialization with api_key
, as in the following example.
from haystack.utils.auth import Secret
from haystack_integrations.components.generators.nvidia import NvidiaGenerator
generator = NvidiaGenerator(
model="meta/llama3-70b-instruct",
api_url="https://integrate.api.nvidia.com/v1",
api_key=Secret.from_token("<your-api-key>"),
model_arguments={
"temperature": 0.2,
"top_p": 0.7,
"max_tokens": 1024,
},
)
generator.warm_up()
result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])
To use a locally deployed model, you need to set the api_url
to your localhost and unset your api_key
.
from haystack_integrations.components.generators.nvidia import NvidiaGenerator
generator = NvidiaGenerator(
model="llama-2-7b",
api_url="http://0.0.0.0:9999/v1",
api_key=None,
model_arguments={
"temperature": 0.2,
},
)
generator.warm_up()
result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])
In a Pipeline
Here's an example of a RAG pipeline:
from haystack import Pipeline, Document
from haystack.utils.auth import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.nvidia import NvidiaGenerator
docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])
query = "What is the capital of France?"
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", NvidiaGenerator(
model="meta/llama3-70b-instruct",
api_url="https://integrate.api.nvidia.com/v1",
api_key=Secret.from_token("<your-api-key>"),
model_arguments={
"temperature": 0.2,
"top_p": 0.7,
"max_tokens": 1024,
},
))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
res=pipe.run({
"prompt_builder": {
"query": query
},
"retriever": {
"query": query
}
})
print(res)
Additional References
🧑🍳 Cookbook: Haystack RAG Pipeline with Self-Deployed AI models using NVIDIA NIMs
Updated about 1 month ago