HuggingFaceTGIChatGenerator
HuggingFaceTGIChatGenerator
enables chat completion using Hugging Face Hub-hosted chat-based LLMs.
Name | HuggingFaceTGIChatGenerator |
Folder Path | /generators/chat/ |
Most common Position in a Pipeline | After the DynamicChatPromptBuilder |
Mandatory Input variables | βmessagesβ: a list of ChatMessage objects representing the chat |
Output variables | βrepliesβ: a list of alternative replies of the LLM to the input chat |
Overview
This component is designed to seamlessly utilize chat-based models deployed on the Text Generation Inference (TGI) backend.
This componentβs main input is a List of ChatMessage
objects. ChatMessage
is a data class that contains a message, a role (who generated the message, such as user
, assistant
, system
, function
), and optional metadata. See the usage section for an example.
Using Hugging Face Inference API
The component uses a HF_API_TOKEN
Β environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token
β see code examples below.
You can use this component for chat LLMs hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
For chat LLMs hosted on paid Inference endpoints or your own custom TGI endpoint, you'll need to provide the URL link of the endpoint as well as a valid token:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", url="<your-tgi-endpoint-url>", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
Key Features
- Hugging Face Inference Endpoints. Supports usage of TGI chat LLMs deployed on Hugging Face Inference endpoints.
- Inference API Support. Supports usage of TGI chat LLMs hosted on the rate-limited Inference API tier. Discover available chat models using the following command:
wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter. - Custom TGI Endpoints. Supports usage of TGI chat LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework.
For more information on TGI, visit https://github.com/huggingface/text-generation-inference.
Learn more about the Inference API at https://huggingface.co/inference-api.
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use these LLMs for text generation (such as translation or summarization tasks) or donβt want to use the
ChatMessage
object, useHuggingFaceTGIGenerator
instead.
Usage
On its own
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
In a Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
# no parameter init, we don't use any runtime template variables
prompt_builder = DynamicChatPromptBuilder()
llm = HuggingFaceTGIChatGenerator(model="meta-llama/Llama-2-70b-chat-hf", token=Secret.from_token("<your-api-key>"))
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}")]
pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "prompt_source": messages}})
>> {'llm': {'replies': [ChatMessage(content='Berlin ist die Hauptstadt Deutschlands und die grΓΆΓte Stadt des Landes.
>> Es ist eine lebhafte Metropole, die fΓΌr ihre Geschichte, Kultur und einzigartigen SehenswΓΌrdigkeiten bekannt ist.
>> Berlin bietet eine vielfΓ€ltige Kulturszene, beeindruckende architektonische Meisterwerke wie den Berliner Dom
>> und das Brandenburger Tor, sowie weltberΓΌhmte Museen wie das Pergamonmuseum. Die Stadt hat auch eine pulsierende
>> Clubszene und ist fΓΌr ihr aufregendes Nachtleben berΓΌhmt. Berlin ist ein Schmelztiegel verschiedener Kulturen und
>> zieht jedes Jahr Millionen von Touristen an.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None}}
Updated 2 months ago