DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

HuggingFaceTGIGenerator

HuggingFaceTGIGenerator enables text generation using Hugging Face Hub-hosted non-chat LLMs.

NameHuggingFaceTGIGenerator
Folder path/generators/
Most common position in a pipelineAfter a PromptBuilder
Mandatory input variables“prompt”: A string containing the prompt for the LLM
Output variables“replies”: A list of strings with all the replies generated by the LLM

”meta”: A list of dictionaries with the metadata associated with each reply, such as token count, finish reason, and others

🚧

Deprecation Warning

This component is deprecated and will be removed in Haystack 2.3.0.

We suggest using HuggingFaceAPIGenerator instead.

Overview

This component is designed to seamlessly utilize models deployed on the Text Generation Inference (TGI) backend.

📘

For an example of this component being used, check out this 🧑‍🍳 Cookbook

Using Hugging Face Inference API

The component uses a HF_API_TOKEN environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token – see code examples below.

You can use this component for LLMs hosted on Hugging Face Inference endpoints, the rate-limited Inference API tier:

from haystack.components.generators import HuggingFaceTGIGenerator

client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")

print(response)

For LLMs hosted on a paid endpoint or your own custom TGI endpoint, you'll need to provide the URL of the endpoint as well as a valid token:

from haystack.components.generators import HuggingFaceTGIGenerator

client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", url="<your-tgi-endpoint-url>", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")

print(response)

Key Features

  • Hugging Face Inference Endpoints. Supports usage of TGI chat LLMs deployed on Hugging Face Inference endpoints.
  • Inference API Support. Supports usage of TGI LLMs hosted on the rate-limited Inference API tier. Discover available LLMs using the following command: wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter.
  • Custom TGI Endpoints. Supports usage of LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework.

📘

For more information on TGI, visit https://github.com/huggingface/text-generation-inference.
Learn more about the Inference API at https://huggingface.co/inference-api.

📘

This component is designed for text generation, not for chat. If you want to use these LLMs for chat, use HuggingFaceTGIChatGenerator instead.

Usage

On its own

from haystack.components.generators import HuggingFaceTGIGenerator

client = HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")

print(response)

In a Pipeline

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceTGIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", HuggingFaceTGIGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>")))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)

Related Links

See parameters details in our API reference: