Name	HuggingFaceLocalGenerator
Folder path	/generators/
Most common position in a pipeline	After a `PromptBuilder`
Mandatory input variables	“prompt”: A string containing the prompt for the LLM
Output variables	“replies”: A list of strings with all the replies generated by the LLM

Overview

Keep in mind that if LLMs run locally, you may need a powerful machine to run them. This depends strongly on the model you select and its parameter count.

📘
Looking for chat completion?
This component is designed for text generation, not for chat. If you want to use Hugging Face LLMs for chat, consider using HuggingFaceLocalChatGenerator instead.

👍
Practical example
To see an example of this component being used, check out this 🧑‍🍳 Cookbook.

For remote files authorization, this component uses a HF_API_TOKEN environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with token:

local_generator = HuggingFaceLocalGenerator(token=Secret.from_token("<your-api-key>"))

Streaming

This Generator supports streaming the tokens from the LLM directly in output. To do so, pass a function to the streaming_callback init parameter.

Usage

On its own

from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(model="google/flan-t5-large",
                                      task="text2text-generation",
                                      generation_kwargs={
                                        "max_new_tokens": 100,
                                        "temperature": 0.9,
                                        })

generator.warm_up()
print(generator.run("Who is the best American actor?"))
# {'replies': ['john wayne']}

In a Pipeline

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

generator = HuggingFaceLocalGenerator(model="google/flan-t5-large",
                                      task="text2text-generation",
                                      generation_kwargs={
                                        "max_new_tokens": 100,
                                        "temperature": 0.9,
                                        })

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", generator)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)

HuggingFaceLocalGenerator

Overview

📘
Looking for chat completion?

👍
Practical example

Streaming

Usage

On its own

In a Pipeline

Overview

📘Looking for chat completion?

👍Practical example

Streaming

Usage

On its own

In a Pipeline

📘
Looking for chat completion?

👍
Practical example