HuggingFaceTGIGenerator
HuggingFaceTGIGenerator
enables text generation using Hugging Face Hub-hosted non-chat LLMs.
Name | HuggingFaceTGIGenerator |
Folder path | /generators/ |
Most common position in a pipeline | After a PromptBuilder |
Mandatory input variables | “prompt”: A string containing the prompt for the LLM |
Output variables | “replies”: A list of strings with all the replies generated by the LLM ”meta”: A list of dictionaries with the metadata associated with each reply, such as token count, finish reason, and others |
Deprecation Warning
This component is deprecated and will be removed in Haystack 2.3.0.
We suggest using
HuggingFaceAPIGenerator
instead.
Overview
This component is designed to seamlessly utilize models deployed on the Text Generation Inference (TGI) backend.
For an example of this component being used, check out this 🧑🍳 Cookbook
Using Hugging Face Inference API
The component uses a HF_API_TOKEN
environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token
– see code examples below.
You can use this component for LLMs hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier:
from haystack.components.generators import HuggingFaceTGIGenerator
client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")
print(response)
For LLMs hosted on a paid endpoint or your own custom TGI endpoint, you'll need to provide the URL of the endpoint as well as a valid token:
from haystack.components.generators import HuggingFaceTGIGenerator
client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", url="<your-tgi-endpoint-url>", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")
print(response)
Key Features
- Hugging Face Inference Endpoints. Supports usage of TGI chat LLMs deployed on Hugging Face Inference endpoints.
- Inference API Support. Supports usage of TGI LLMs hosted on the rate-limited Inference API tier. Discover available LLMs using the following command:
wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter. - Custom TGI Endpoints. Supports usage of LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework.
For more information on TGI, visit https://github.com/huggingface/text-generation-inference.
Learn more about the Inference API at https://huggingface.co/inference-api.
This component is designed for text generation, not for chat. If you want to use these LLMs for chat, use
HuggingFaceTGIChatGenerator
instead.
Usage
On its own
from haystack.components.generators import HuggingFaceTGIGenerator
client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")
print(response)
In a Pipeline
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceTGIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])
query = "What is the capital of France?"
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>")))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
res=pipe.run({
"prompt_builder": {
"query": query
},
"retriever": {
"query": query
}
})
print(res)
Updated 7 months ago
See parameters details in our API reference: