Name	HuggingFaceTGIChatGenerator
Folder Path	/generators/chat/
Most common Position in a Pipeline	After the `DynamicChatPromptBuilder`
Mandatory Input variables	“messages”: a list of `ChatMessage` objects representing the chat
Output variables	“replies”: a list of alternative replies of the LLM to the input chat

Overview

This component is designed to seamlessly utilize chat-based models deployed on the Text Generation Inference (TGI) backend.

This component’s main input is a List of ChatMessage objects. ChatMessage is a data class that contains a message, a role (who generated the message, such as user, assistant, system, function), and optional metadata. See the usage section for an example.

Using Hugging Face Inference API

The component uses a HF_API_TOKEN environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token – see code examples below.

You can use this component for chat LLMs hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier:

from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})

print(response)

For chat LLMs hosted on paid Inference endpoints or your own custom TGI endpoint, you'll need to provide the URL link of the endpoint as well as a valid token:

from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", url="<your-tgi-endpoint-url>", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})

print(response)

Key Features

Hugging Face Inference Endpoints. Supports usage of TGI chat LLMs deployed on Hugging Face Inference endpoints.
Inference API Support. Supports usage of TGI chat LLMs hosted on the rate-limited Inference API tier. Discover available chat models using the following command: wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chatand simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter.
Custom TGI Endpoints. Supports usage of TGI chat LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework.

📘
For more information on TGI, visit https://github.com/huggingface/text-generation-inference.
Learn more about the Inference API at https://huggingface.co/inference-api.

📘
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use these LLMs for text generation (such as translation or summarization tasks) or don’t want to use the ChatMessage object, use HuggingFaceTGIGenerator instead.

Usage

On its own

from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})

print(response)

In a Pipeline

from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

# no parameter init, we don't use any runtime template variables
prompt_builder = DynamicChatPromptBuilder()
llm = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
            ChatMessage.from_user("Tell me about {{location}}")]
pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "prompt_source": messages}})

>> {'llm': {'replies': [ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und die größte Stadt des Landes.
>> Es ist eine lebhafte Metropole, die für ihre Geschichte, Kultur und einzigartigen Sehenswürdigkeiten bekannt ist.
>> Berlin bietet eine vielfältige Kulturszene, beeindruckende architektonische Meisterwerke wie den Berliner Dom
>> und das Brandenburger Tor, sowie weltberühmte Museen wie das Pergamonmuseum. Die Stadt hat auch eine pulsierende
>> Clubszene und ist für ihr aufregendes Nachtleben berühmt. Berlin ist ein Schmelztiegel verschiedener Kulturen und
>> zieht jedes Jahr Millionen von Touristen an.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None}}