DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

HuggingFaceLocalGenerator

HuggingFaceLocalGenerator provides an interface to generate text using a Hugging Face model that runs locally.

Most common position in a pipelineAfter a PromptBuilder
Mandatory init variables"token": The Hugging Face API token. Can be set with HF_API_TOKEN or HF_TOKEN env var.
Mandatory run variables“prompt”: A string containing the prompt for the LLM
Output variables“replies”: A list of strings with all the replies generated by the LLM
API referenceGenerators
GitHub linkhttps://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/hugging_face_local.py

Overview

Keep in mind that if LLMs run locally, you may need a powerful machine to run them. This depends strongly on the model you select and its parameter count.

📘

Looking for chat completion?

This component is designed for text generation, not for chat. If you want to use Hugging Face LLMs for chat, consider using HuggingFaceLocalChatGenerator instead.

For remote files authorization, this component uses a HF_API_TOKEN environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token:

local_generator = HuggingFaceLocalGenerator(token=Secret.from_token("<your-api-key>"))

Streaming

This Generator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback init parameter.

Usage

On its own

from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(model="google/flan-t5-large",
                                      task="text2text-generation",
                                      generation_kwargs={
                                        "max_new_tokens": 100,
                                        "temperature": 0.9,
                                        })

generator.warm_up()
print(generator.run("Who is the best American actor?"))
# {'replies': ['john wayne']}

In a Pipeline

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

generator = HuggingFaceLocalGenerator(model="google/flan-t5-large",
                                      task="text2text-generation",
                                      generation_kwargs={
                                        "max_new_tokens": 100,
                                        "temperature": 0.9,
                                        })

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)

Additional References

🧑‍🍳 Cookbooks: