HuggingFaceTGIChatGenerator
HuggingFaceTGIChatGenerator
enables chat completion using Hugging Face Hub-hosted chat-based LLMs.
Name | HuggingFaceTGIChatGenerator |
Folder path | /generators/chat/ |
Most common position in a pipeline | After the DynamicChatPromptBuilder |
Mandatory input variables | “messages”: A list of ChatMessage objects representing the chat |
Output variables | “replies”: A list of alternative replies of the LLM to the input chat |
Deprecation Warning
This component is deprecated and will be removed in Haystack 2.3.0.
We suggest using
HuggingFaceAPIChatGenerator
instead.
Overview
This component is designed to seamlessly utilize chat-based models deployed on the Text Generation Inference (TGI) backend.
This component’s main input is a List of ChatMessage
objects. ChatMessage
is a data class that contains a message, a role (who generated the message, such as user
, assistant
, system
, function
), and optional metadata. See the usage section for an example.
Using Hugging Face Inference API
The component uses a HF_API_TOKEN
environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token
– see code examples below.
You can use this component for chat LLMs hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
For chat LLMs hosted on paid Inference endpoints or your own custom TGI endpoint, you'll need to provide the URL link of the endpoint as well as a valid token:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", url="<your-tgi-endpoint-url>", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
Key Features
- Hugging Face Inference Endpoints. Supports usage of TGI chat LLMs deployed on Hugging Face Inference endpoints.
- Inference API Support. Supports usage of TGI chat LLMs hosted on the rate-limited Inference API tier. Discover available chat models using the following command:
wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter. - Custom TGI Endpoints. Supports usage of TGI chat LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework.
For more information on TGI, visit https://github.com/huggingface/text-generation-inference.
Learn more about the Inference API at https://huggingface.co/inference-api.
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use these LLMs for text generation (such as translation or summarization tasks) or don’t want to use the
ChatMessage
object, useHuggingFaceTGIGenerator
instead.
Usage
On its own
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
In a Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
# no parameter init, we don't use any runtime template variables
prompt_builder = DynamicChatPromptBuilder()
llm = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}")]
pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "prompt_source": messages}})
>> {'llm': {'replies': [ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und die größte Stadt des Landes.
>> Es ist eine lebhafte Metropole, die für ihre Geschichte, Kultur und einzigartigen Sehenswürdigkeiten bekannt ist.
>> Berlin bietet eine vielfältige Kulturszene, beeindruckende architektonische Meisterwerke wie den Berliner Dom
>> und das Brandenburger Tor, sowie weltberühmte Museen wie das Pergamonmuseum. Die Stadt hat auch eine pulsierende
>> Clubszene und ist für ihr aufregendes Nachtleben berühmt. Berlin ist ein Schmelztiegel verschiedener Kulturen und
>> zieht jedes Jahr Millionen von Touristen an.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None}}
Updated 6 months ago