Nvidia
Module haystack_integrations.components.embedders.nvidia.document_embedder
NvidiaDocumentEmbedder
A component for embedding documents using embedding models provided by NVIDIA NIMs.
Usage example:
from haystack_integrations.components.embedders.nvidia import NvidiaDocumentEmbedder
doc = Document(content="I love pizza!")
text_embedder = NvidiaDocumentEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1")
text_embedder.warm_up()
result = document_embedder.run([doc])
print(result["documents"][0].embedding)
NvidiaDocumentEmbedder.__init__
def __init__(model: str | None = None,
api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
prefix: str = "",
suffix: str = "",
batch_size: int = 32,
progress_bar: bool = True,
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n",
truncate: EmbeddingTruncateMode | str | None = None,
timeout: float | None = None) -> None
Create a NvidiaTextEmbedder component.
Arguments:
model: Embedding model to use. If no specific model along with locally hosted API URL is provided, the system defaults to the available model found using /models API.api_key: API key for the NVIDIA NIM.api_url: Custom API URL for the NVIDIA NIM. Format for API URL ishttp://host:portprefix: A string to add to the beginning of each text.suffix: A string to add to the end of each text.batch_size: Number of Documents to encode at once. Cannot be greater than 50.progress_bar: Whether to show a progress bar or not.meta_fields_to_embed: List of meta fields that should be embedded along with the Document text.embedding_separator: Separator used to concatenate the meta fields to the Document text.truncate: Specifies how inputs longer than the maximum token length should be truncated. If None the behavior is model-dependent, see the official documentation for more information.timeout: Timeout for request calls, if not set it is inferred from theNVIDIA_TIMEOUTenvironment variable or set to 60 by default.
NvidiaDocumentEmbedder.default_model
Set default model in local NIM mode.
NvidiaDocumentEmbedder.warm_up
Initializes the component.
NvidiaDocumentEmbedder.to_dict
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
NvidiaDocumentEmbedder.available_models
Get a list of available models that work with NvidiaDocumentEmbedder.
NvidiaDocumentEmbedder.from_dict
Deserializes the component from a dictionary.
Arguments:
data: The dictionary to deserialize from.
Returns:
The deserialized component.
NvidiaDocumentEmbedder.run
@component.output_types(documents=list[Document], meta=dict[str, Any])
def run(documents: list[Document]
) -> dict[str, list[Document] | dict[str, Any]]
Embed a list of Documents.
The embedding of each Document is stored in the embedding field of the Document.
Arguments:
documents: A list of Documents to embed.
Raises:
TypeError: If the input is not a list of Documents.
Returns:
A dictionary with the following keys and values:
documents- List of processed Documents with embeddings.meta- Metadata on usage statistics, etc.
Module haystack_integrations.components.embedders.nvidia.text_embedder
NvidiaTextEmbedder
A component for embedding strings using embedding models provided by NVIDIA NIMs.
For models that differentiate between query and document inputs, this component embeds the input string as a query.
Usage example:
from haystack_integrations.components.embedders.nvidia import NvidiaTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = NvidiaTextEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1")
text_embedder.warm_up()
print(text_embedder.run(text_to_embed))
NvidiaTextEmbedder.__init__
def __init__(model: str | None = None,
api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
prefix: str = "",
suffix: str = "",
truncate: EmbeddingTruncateMode | str | None = None,
timeout: float | None = None)
Create a NvidiaTextEmbedder component.
Arguments:
model: Embedding model to use. If no specific model along with locally hosted API URL is provided, the system defaults to the available model found using /models API.api_key: API key for the NVIDIA NIM.api_url: Custom API URL for the NVIDIA NIM. Format for API URL ishttp://host:portprefix: A string to add to the beginning of each text.suffix: A string to add to the end of each text.truncate: Specifies how inputs longer that the maximum token length should be truncated. If None the behavior is model-dependent, see the official documentation for more information.timeout: Timeout for request calls, if not set it is inferred from theNVIDIA_TIMEOUTenvironment variable or set to 60 by default.
NvidiaTextEmbedder.default_model
Set default model in local NIM mode.
NvidiaTextEmbedder.warm_up
Initializes the component.
NvidiaTextEmbedder.to_dict
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
NvidiaTextEmbedder.available_models
Get a list of available models that work with NvidiaTextEmbedder.
NvidiaTextEmbedder.from_dict
Deserializes the component from a dictionary.
Arguments:
data: The dictionary to deserialize from.
Returns:
The deserialized component.
NvidiaTextEmbedder.run
Embed a string.
Arguments:
text: The text to embed.
Raises:
TypeError: If the input is not a string.ValueError: If the input string is empty.
Returns:
A dictionary with the following keys and values:
embedding- Embedding of the text.meta- Metadata on usage statistics, etc.
Module haystack_integrations.components.embedders.nvidia.truncate
EmbeddingTruncateMode
Specifies how inputs to the NVIDIA embedding components are truncated. If START, the input will be truncated from the start. If END, the input will be truncated from the end. If NONE, an error will be returned (if the input is too long).
EmbeddingTruncateMode.from_str
Create an truncate mode from a string.
Arguments:
string: String to convert.
Returns:
Truncate mode.
Module haystack_integrations.components.generators.nvidia.chat.chat_generator
NvidiaChatGenerator
Enables text generation using NVIDIA generative models. For supported models, see NVIDIA Docs.
Users can pass any text generation parameters valid for the NVIDIA Chat Completion API
directly to this component via the generation_kwargs parameter in __init__ or the generation_kwargs
parameter in run method.
This component uses the ChatMessage format for structuring both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios. Details on the ChatMessage format can be found in the Haystack docs
For more details on the parameters supported by the NVIDIA API, refer to the NVIDIA Docs.
Usage example:
from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_user("What's Natural Language Processing?")]
client = NvidiaChatGenerator()
response = client.run(messages)
print(response)
NvidiaChatGenerator.__init__
def __init__(*,
api_key: Secret = Secret.from_env_var("NVIDIA_API_KEY"),
model: str = "meta/llama-3.1-8b-instruct",
streaming_callback: StreamingCallbackT | None = None,
api_base_url: str | None = os.getenv("NVIDIA_API_URL",
DEFAULT_API_URL),
generation_kwargs: dict[str, Any] | None = None,
tools: ToolsType | None = None,
timeout: float | None = None,
max_retries: int | None = None,
http_client_kwargs: dict[str, Any] | None = None) -> None
Creates an instance of NvidiaChatGenerator.
Arguments:
api_key: The NVIDIA API key.model: The name of the NVIDIA chat completion model to use.streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.api_base_url: The NVIDIA API Base url.generation_kwargs: Other parameters to use for the model. These parameters are all sent directly to the NVIDIA API endpoint. See NVIDIA API docs for more details. Some of the supported parameters:max_tokens: The maximum number of tokens the output text can have.temperature: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.stream: Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.response_format: For NVIDIA NIM servers, this parameter has limited support.- The basic JSON mode with
{"type": "json_object"}is supported by compatible models, to produce valid JSON output. To pass the JSON schema to the model, use theguided_jsonparameter inextra_body. For example:
pythonFor more details, see the NVIDIA NIM documentation.generation_kwargs={
"extra_body": {
"nvext": {
"guided_json": {
json_schema
}
}
}- The basic JSON mode with
tools: A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a list ofToolobjects or aToolsetinstance.timeout: The timeout for the NVIDIA API call.max_retries: Maximum number of retries to contact NVIDIA after an internal error. If not set, it defaults to either theNVIDIA_MAX_RETRIESenvironment variable, or set to 5.http_client_kwargs: A dictionary of keyword arguments to configure a customhttpx.Clientorhttpx.AsyncClient. For more information, see the HTTPX documentation.
NvidiaChatGenerator.to_dict
Serialize this component to a dictionary.
Returns:
The serialized component as a dictionary.
Module haystack_integrations.components.generators.nvidia.generator
NvidiaGenerator
Generates text using generative models hosted with NVIDIA NIM on the NVIDIA API Catalog.
Usage example
from haystack_integrations.components.generators.nvidia import NvidiaGenerator
generator = NvidiaGenerator(
model="meta/llama3-8b-instruct",
model_arguments={
"temperature": 0.2,
"top_p": 0.7,
"max_tokens": 1024,
},
)
generator.warm_up()
result = generator.run(prompt="What is the answer?")
print(result["replies"])
print(result["meta"])
print(result["usage"])
You need an NVIDIA API key for this component to work.
NvidiaGenerator.__init__
def __init__(model: str | None = None,
api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
model_arguments: dict[str, Any] | None = None,
timeout: float | None = None) -> None
Create a NvidiaGenerator component.
Arguments:
model: Name of the model to use for text generation. See the NVIDIA NIMs for more information on the supported models.Note: If no specific model along with locally hosted API URL is provided, the system defaults to the available model found using /models API. Check supported models at NVIDIA NIM.api_key: API key for the NVIDIA NIM. Set it as theNVIDIA_API_KEYenvironment variable or pass it here.api_url: Custom API URL for the NVIDIA NIM.model_arguments: Additional arguments to pass to the model provider. These arguments are specific to a model. Search your model in the NVIDIA NIM to find the arguments it accepts.timeout: Timeout for request calls, if not set it is inferred from theNVIDIA_TIMEOUTenvironment variable or set to 60 by default.
NvidiaGenerator.default_model
Set default model in local NIM mode.
NvidiaGenerator.warm_up
Initializes the component.
NvidiaGenerator.to_dict
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
NvidiaGenerator.available_models
Get a list of available models that work with ChatNVIDIA.
NvidiaGenerator.from_dict
Deserializes the component from a dictionary.
Arguments:
data: Dictionary to deserialize from.
Returns:
Deserialized component.
NvidiaGenerator.run
Queries the model with the provided prompt.
Arguments:
prompt: Text to be sent to the generative model.
Returns:
A dictionary with the following keys:
replies- Replies generated by the model.meta- Metadata for each reply.
Module haystack_integrations.components.rankers.nvidia.ranker
NvidiaRanker
A component for ranking documents using ranking models provided by NVIDIA NIMs.
Usage example:
from haystack_integrations.components.rankers.nvidia import NvidiaRanker
from haystack import Document
from haystack.utils import Secret
ranker = NvidiaRanker(
model="nvidia/nv-rerankqa-mistral-4b-v3",
api_key=Secret.from_env_var("NVIDIA_API_KEY"),
)
ranker.warm_up()
query = "What is the capital of Germany?"
documents = [
Document(content="Berlin is the capital of Germany."),
Document(content="The capital of Germany is Berlin."),
Document(content="Germany's capital is Berlin."),
]
result = ranker.run(query, documents, top_k=2)
print(result["documents"])
NvidiaRanker.__init__
def __init__(model: str | None = None,
truncate: RankerTruncateMode | str | None = None,
api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
top_k: int = 5,
query_prefix: str = "",
document_prefix: str = "",
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n",
timeout: float | None = None) -> None
Create a NvidiaRanker component.
Arguments:
model: Ranking model to use.truncate: Truncation strategy to use. Can be "NONE", "END", or RankerTruncateMode. Defaults to NIM's default.api_key: API key for the NVIDIA NIM.api_url: Custom API URL for the NVIDIA NIM.top_k: Number of documents to return.query_prefix: A string to add at the beginning of the query text before ranking. Use it to prepend the text with an instruction, as required by reranking models likebge.document_prefix: A string to add at the beginning of each document before ranking. You can use it to prepend the document with an instruction, as required by embedding models likebge.meta_fields_to_embed: List of metadata fields to embed with the document.embedding_separator: Separator to concatenate metadata fields to the document.timeout: Timeout for request calls, if not set it is inferred from theNVIDIA_TIMEOUTenvironment variable or set to 60 by default.
NvidiaRanker.to_dict
Serialize the ranker to a dictionary.
Returns:
A dictionary containing the ranker's attributes.
NvidiaRanker.from_dict
Deserialize the ranker from a dictionary.
Arguments:
data: A dictionary containing the ranker's attributes.
Returns:
The deserialized ranker.
NvidiaRanker.warm_up
Initialize the ranker.
Raises:
ValueError: If the API key is required for hosted NVIDIA NIMs.
NvidiaRanker.run
@component.output_types(documents=list[Document])
def run(query: str,
documents: list[Document],
top_k: int | None = None) -> dict[str, list[Document]]
Rank a list of documents based on a given query.
Arguments:
query: The query to rank the documents against.documents: The list of documents to rank.top_k: The number of documents to return.
Raises:
TypeError: If the arguments are of the wrong type.
Returns:
A dictionary containing the ranked documents.
Module haystack_integrations.components.rankers.nvidia.truncate
RankerTruncateMode
Specifies how inputs to the NVIDIA ranker components are truncated. If NONE, the input will not be truncated and an error returned instead. If END, the input will be truncated from the end.
RankerTruncateMode.from_str
Create an truncate mode from a string.
Arguments:
string: String to convert.
Returns:
Truncate mode.