HuggingFaceLocalGenerator
HuggingFaceLocalGenerator provides an interface to generate text using a Hugging Face model that runs locally.
| Most common position in a pipeline | After a PromptBuilder |
| Mandatory init variables | token: The Hugging Face API token. Can be set with HF_API_TOKEN or HF_TOKEN env var. |
| Mandatory run variables | prompt: A string containing the prompt for the LLM |
| Output variables | replies: A list of strings with all the replies generated by the LLM |
| API reference | Generators |
| GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/hugging_face_local.py |
Overview
Keep in mind that if LLMs run locally, you may need a powerful machine to run them. This depends strongly on the model you select and its parameter count.
:::info Looking for chat completion?
This component is designed for text generation, not for chat. If you want to use Hugging Face LLMs for chat, consider using HuggingFaceLocalChatGenerator instead.
:::
For remote files authorization, this component uses a HF_API_TOKEN environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token:
Streaming
This Generator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback init parameter.
Usage
On its own
from haystack.components.generators import HuggingFaceLocalGenerator
generator = HuggingFaceLocalGenerator(
model="google/flan-t5-large",
task="text2text-generation",
generation_kwargs={
"max_new_tokens": 100,
"temperature": 0.9,
},
)
generator.warm_up()
print(generator.run("Who is the best American actor?"))
## {'replies': ['john wayne']}
In a Pipeline
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
docstore = InMemoryDocumentStore()
docstore.write_documents(
[
Document(content="Rome is the capital of Italy"),
Document(content="Paris is the capital of France"),
],
)
generator = HuggingFaceLocalGenerator(
model="google/flan-t5-large",
task="text2text-generation",
generation_kwargs={
"max_new_tokens": 100,
"temperature": 0.9,
},
)
query = "What is the capital of France?"
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
res = pipe.run({"prompt_builder": {"query": query}, "retriever": {"query": query}})
print(res)
Additional References
🧑🍳 Cookbooks: