HuggingFaceAPIGenerator
This generator enables text generation using various Hugging Face APIs.
Name | HuggingFaceAPIGenerator |
Folder path | /generators/ |
Most common position in a pipeline | After a PromptBuilder |
Mandatory input variables | “prompt”: A string containing the prompt for the LLM |
Output variables | “replies”: A list of strings with all the replies generated by the LLM ”meta”: A list of dictionaries with the metadata associated with each reply, such as token count, finish reason, and others |
Overview
HuggingFaceAPIGenerator
can be used to generate text using different Hugging Face APIs:
This component is designed for text generation, not for chat. If you want to use these LLMs for chat, use
HuggingFaceAPIChatGenerator
instead.
The component uses a HF_API_TOKEN
environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token
– see code examples below.
The token is needed:
- If you use the Serverless Inference API, or
- If you use the Inference Endpoints.
Streaming
This Generator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback
init parameter.
Usage
On its own
Using Free Serverless Inference API
Formerly known as (free) Hugging Face Inference API, this API allows you to quickly experiment with many models hosted on the Hugging Face Hub, offloading the inference to Hugging Face servers. It’s rate-limited and not meant for production.
To use this API, you need a free Hugging Face token.
The Generator expects the model
in api_params
.
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
Using Paid Inference Endpoints
In this case, a private instance of the model is deployed by Hugging Face, and you typically pay per hour.
To understand how to spin up an Inference Endpoint, visit Hugging Face documentation.
Additionally, in this case, you need to provide your Hugging Face token.
The Generator expects the url
of your endpoint in api_params
.
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
api_params={"url": "<your-inference-endpoint-url>"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
Using Self-Hosted Text Generation Inference (TGI)
Hugging Face Text Generation Inference is a toolkit for efficiently deploying and serving LLMs.
While it powers the most recent versions of Serverless Inference API and Inference Endpoints, it can be used easily on-premise through Docker.
For example, you can run a TGI container as follows:
model=mistralai/Mistral-7B-v0.1
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model
For more information, refer to the official TGI repository.
The Generator expects the url
of your TGI instance in api_params
.
from haystack.components.generators import HuggingFaceAPIGenerator
generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
api_params={"url": "http://localhost:8080"})
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
In a pipeline
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])
query = "What is the capital of France?"
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
token=Secret.from_token("<your-api-key>"))
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
res=pipe.run({
"prompt_builder": {
"query": query
},
"retriever": {
"query": query
}
})
print(res)
Updated 12 days ago