HomeDocumentationAPI ReferenceWhat's NewTutorials
Haystack Homepage


PromptNode brings you the power of large language models. It's an easy-to-use, customizable node that you can run on its own or in your pipelines for various NLP tasks.

With PromptNode, you can use large language models directly or in your pipelines.

What are large language models?

Large language models are huge models trained on enormous amounts of data. Interacting with such a model resembles talking to another person. These models have general knowledge of the world. You can ask them anything, and they'll be able to answer.

Large language models are trained to perform many NLP tasks with little training data. What's astonishing about them is that a single model can perform various NLP tasks with good accuracy.

Some examples of large language models include flan-t5-base, flan-paLM, chinchilla, and GPT-3 variants, such as text-davinci-003.

PromptNode is a very versatile node. It's used in query pipelines, but its position depends on what you want it to do. You can pass a template to specify the NLP task the PromptNode should perform and a model to use. For more information, see the Usage section.

Position in a PipelineUsed in query pipelines. The position depends on the NLP task you want it to do.
InputDepends on the NLP task it performs. Some examples are query, documents, output of the preceding node.
OutputDepends on the NLP task it performs. Some examples are answer, query, document summary.


You can use PromptNode as a stand-alone node or in a pipeline. If you don't specify the model you want to use for the node, it uses flan t5 base.


Maximum Token Limits

Each LLM has an overall maximum token limit that it can process. This limit includes both the prompt (input) and the response (output).

The max_length parameter in PromptNode only sets the maximum number of tokens for the generated text output. Therefore, take the potential token length of the prompt into account when setting the parameter. The token length of the prompt plus the specified max_length together must not be larger than the overall number of tokens the LLM can process.

For example, OpenAI's text-davinci-003 overall limit is 4096 tokens and for gpt-4-32k the limit is 32768 tokens.

Stand Alone

Just initialize the node and ask a question. The model has general knowledge about the world, so you can ask it anything:

from haystack.nodes import PromptNode

# Initialize the node:
prompt_node = PromptNode()

# Run a prompt
prompt_node("What is the capital of Germany?")

# Here's the output:

With a Template

PromptNode can use a PromptTemplate that contains the prompt for the model.

For better results, use a task-specific PromptTemplate. You can pass additional variables like documents or questions to the node. The template combines all inputs into one or more prompts:

from haystack.nodes import PromptNode, PromptTemplate
from haystack import Document

# Initalize the node
prompt_node = PromptNode()

# You can check what out-of-the-box PromptTemplates are available:

# Here's the output:

# Specify the template using the `prompt` method 
# and pass your documents and questions:
          documents=[Document("Berlin is the capital of Germany."), Document("Paris is the capital of France.")],
          query="What is the capital of Germany?")

# Here's the output:
[<Answer {'answer': 'Berlin', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_ids': ['1a7644ef76698b7a1c6ed23c357fa598', 'f225a94f83349e8776d6fb89ebfb41b8'], 'meta': {'prompt': 'Given the context please answer the question. Context: Berlin is the capital of Germany. Paris is the capital of France.; Question: What is the capital of Germany?; Answer:'}}>]

You can also create your own templates and pass variables and functions to them. For more information, see the Prompt Templates section.

With a Model Specified

By default, the PromptNode uses the flan t5 base model. But you can change it to any of these models:

  • Hugging Face transformers (all text and text2text-generation models)
  • OpenAI InstructGPT models, including ChatGPT and GPT-4
  • Azure OpenAI InstructGPT models

Here's how you set the model:

from haystack.nodes import PromptNode

# Initalize the node passing the model:
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl")

# Go ahead and ask a question:
prompt_node("What is the best city in Europe to live in?")


You can enable streaming in PromptNode. Streaming will output LLM responses word by word rather than waiting for the entire response to be generated before outputting everything at once.

To enable streaming, you can specify one or both of the following parameters:

  • stream is a boolean switch that simply enables streaming.
  • stream_handler needs to be a subclass of TokenStreamingHandler.

Both parameters are specified in the PromptNode init constructor. You can override them per PromptNode request by providing them as kwargs.

Here's how to quickly enable streaming:

from haystack.nodes.prompt import PromptNode

pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream":True})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use numbered list"

The default streaming is used when stream is on, and stream_handler is not specified.
When you provide the stream_handler, streaming is enabled. You can register your custom handler to output responses within a custom execution.

Here's how you register a custom handler:

from haystack.nodes.prompt import PromptNode
from haystack.nodes.prompt.invocation_layer.handlers import TokenStreamingHandler

class MyCustomTokenStreamingHandler(TokenStreamingHandler):
    def __call__(self, token_received, **kwargs) -> str:
        # here is your custom logic for each token
        return token_received

custom_handler = MyCustomTokenStreamingHandler()
pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream_handler": custom_handler})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use a numbered list."

You can use your implementation of the TokenStreamingHandler for all invocation layers that support streaming. For example, if you switch from OpenAI to Hugging Face transformers, you can use the same custom TokenStreamingHandler for both.

In a Pipeline

The real power of PromptNode shows when you use it in a pipeline. Look at the example to get an idea of what's possible.


Long-Form Question Answering

Long-form QA is one use of the PromptNode, but certainly not the only one. In this QA type, PromptNode handles complex questions by synthesizing information from various documents to retrieve an answer.

from haystack.pipelines import Pipeline
from haystack.nodes import  PromptNode, PromptTemplate
from haystack.schema import Document

# Let's create a custom LFQA prompt using PromptTemplate
lfqa_prompt = PromptTemplate(name="lfqa",
                             prompt_text="""Synthesize a comprehensive answer from the following topk most relevant paragraphs and the given question. 
                             Provide a clear and concise response that summarizes the key points and information presented in the paragraphs. 
                             Your answer should be in your own words and be no longer than 50 words. 
                             \n\n Paragraphs: {join(documents)} \n\n Question: {query} \n\n Answer:""",

# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."

# Let's initiate the PromptNode 
node = PromptNode("text-davinci-003", default_prompt_template=lfqa_prompt, api_key=api_key)

pipe = Pipeline()
pipe.add_node(component=node, name="prompt_node", inputs=["Query"])

output = pipe.run(query="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
[a.answer for a in output["answers"]]

# Here's the answer:
["Contrails are manmade clouds formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, creating a visible trail. Increased air traffic has been linked to the greater frequency and amount of these cirrus clouds in Earth's atmosphere."]

Multiple PromptNodes and a Retriever

You can have multiple PromptNodes in your pipeline that reuse the PromptModel to save resources:

from haystack import Pipeline
from haystack.nodes.prompt import PromptNode, PromptModel
# You'd also need to import a Retriever and a DocumentStore, 
# we're skipping this in this example

top_k = 10
query = "Who is Paul Atreides' father?"

prompt_model = PromptModel()
node = PromptNode(prompt_model, default_prompt_template="question-generation", output_variable="query")
node2 = PromptNode(prompt_model, default_prompt_template="question-answering-per-document")

# You'd also need to initialize a Retriever with a DocumentStore
# We're skipping this step in this example to simplify it
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=node, name="prompt_node", inputs=["retriever"])
pipe.add_node(component=node2, name="prompt_node_2", inputs=["prompt_node"])
output = pipe.run(query=query, params={"retriever": {"top_k": 10}})

dict(zip(output["query"], output["answers"]))


PromptNode comes with out-of-the-box prompt templates ready for you to use. A template corresponds to an NLP task and contains instructions for the model. Here are some of the templates you can specify for the PromptNode:

  • question-answering
    This template joins all documents into one document and passes one prompt to the model, instructing it to perform question answering on the joined document.
  • question-answering-per-document
    This template performs question answering by passing one prompt per document. This may improve the output but is more resource intensive than the question-answering template.
  • question-answering-with-references
    This is the same as question answering, but instructs the model to cite documents by adding references. It joins all documents into one and passes one prompt to the model. The model outputs answers together with references to the documents that contain them.
  • question-answering-with-document-scores
    Performs question answering, taking into account document scores stored in metadata. This is the template used by the PromptNode in the WebQAPipeline.
  • question-generation
    Generates a question based on your documents.
  • conditioned-question-generation
    Based on your documents, generates a question for the answer you provide.
  • summarization
    Summarizes documents.
  • question-answering-check
    Checks if the documents contain the answer to a question.
  • sentiment-analysis
    Analyzes the sentiment of the documents.
  • multiple-choice-question-answering
    From a set of options, chooses the one that best answers the question.
  • topic-classification
    Categorizes documents by their topic.
  • language-detection
    Returns the language of the documents.
  • translation
    Translates documents.

For a full list of ready-to-use templates, see PromptTemplate in the Github repo. You can also call get_predefined_prompt_templates() to get this list.

from haystack.nodes.prompt.prompt_template import get_predefined_prompt_templates


If you don't specify the template, the node tries to guess what task you want it to perform. By indicating the template, you ensure it performs the right task.

PromptTemplate Structure

Here's an example of the question-answering template:

            prompt_text="Given the context please answer the question. Context: {join(documents)}; Question: "
            "{query}; Answer:",
  • prompt_text contains the prompt for the task you want the model to do. It also specifies input variables: document and query. The variables are either primitives or lists of primitives.
    At runtime, these variables must be present in the execution context of the node. You can apply functions to those variables. For example, you can combine the list of documents into a string by applying the join function. By doing this, only one prompt instead of len(documents) prompts is executed.
  • output_parser converts the output of the model to Haystack Document, Answer, or Label object. There's a ready-to-use AnswerParser which converts the output to the Haystack Answer object. Have a look at the API documentation for more information.

Functions in Prompts

You can add functions to your template to control how the documents, the query, or any other variable are rendered. A simplified version of the question-answering template looks like this:

            prompt_text="Please answer the question. "
            "Context: {' - '.join([d.meta['name']+': '+d.content for d in documents])}; Question: {query}; Answer: ",

Function Format

The functions use the Python f-string format, so you can use any list comprehensions inside a function:

' '.join([d.meta['name']+': '+d.content for d in documents])

Other than strict f-string syntax, you can safely use the following backslash characters in the text parts of the prompt text: \n, \t, \r. To use them in f-string expressions, pick the corresponding PromptTemplate variable from the table below.
Double quotes (") are automatically replaced with single quotes (') in the prompt text. To use double quotes in the prompt text, use {double_quote} instead.

Special characters not allowed in prompt_text expressionsPromptTemplate variable to use instead

Some of the ready-made templates also contain functions, for example:

            prompt_text="Create a concise and informative answer (no more than 50 words) for a given question "
            "based solely on the given documents. You must only use information from the given documents. "
            "Use an unbiased and journalistic tone. Do not repeat text. Cite the documents using Document[number] notation. "
            "If multiple documents contain the answer, cite those documents like ‘as stated in Document[number], Document[number], etc.’. "
            "If the documents do not contain the answer to the question, say that ‘answering is not possible given the available information.’\n"
            "{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n Question: {query}; Answer: ",

Note that in this example, we're not using the str.join Python function but our own convenience function join.

join and to_strings Functions

Two functions that you may find useful are join and to_strings.

The join function joins all documents into a single string, where the content of each document is separated by the delimiter you specify.


"{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})}

The to strings function extracts the content field of documents and returns a list of strings. In the example below, it renders each document by its name (document.meta["name"]) followed by a new line and the contents of the document:

"{to_strings(documents, pattern='$name'+new_line+'$content', str_replace={new_line: ' ', '[': '(', ']': ')'})}

Function Parameters

documentsListThe documents whose rendering you want to format. Mandatory.
patternStringThe regex pattern used for parsing. Optional.
You can use the following placeholders in pattern:
- $content: The content of the document.
- $idx: The index of the document in the list.
- $id: The ID of the document.
- $META_FIELD: The values of the metadata field called META_FIELD.
Default: " " (single space)
Specifies the delimiter you want to use to separate documents. Used in the join function. Mandatory.
str_replaceDictionary of stringsSpecifies the characters you want to replace. Use the format str_replace={"r":"R"}. Optional.

Output Parsers


With AnswerParser , you can convert the plain string model output into proper Answer objects. It takes care of populating the Answer's fields like adding the prompt to meta or referencing source document_ids. Using AnswerParser makes PromptNode publish its results in the answers key. This way, you can use PromptNode as plug-in replacements for any answer-returning nodes, like, for example Reader or OpenAIAnswerGenerator.

patternStringThe regex pattern to use for parsing the answer.

[^\n]+$ will find "this is an answer" in string "this is an argument.\nthis is an answer".
Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer".

If None, the whole string is used as the answer. If specified, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
reference_patternStringThe regex pattern to use for parsing the document references.

\[(\d+)\] will find "1" in string "this is an answer"De".

If None, no parsing is done and all documents are referenced.

Adding Your Prompt

You can easily add your own template:

from haystack.nodes import PromptTemplate, PromptNode

# In `prompt_text`, tell the model what you want it to do.
                                              prompt_text="Indicate the sentiment. Answer with positive, negative, or neutral. Context: {documents}; Answer:"))

For guidelines on how to construct the most efficient prompts, see Prompt Engineering Guidelines.

Setting a Default PromptTemplate

You can set a default template for a PromptNode instance. This way, you can reuse the same PromptNode in your pipeline for different tasks:

from haystack.nodes import PromptTemplate, PromptNode
from haystack.schema import Document

prompt_node = PromptNode()
sa = prompt_node.set_default_prompt_template("sentiment-analysis-new")
sa(documents=[Document("I am in love and I feel great!")])

# Node output:

# You can then switch to another template to reuse the same PromptNode
# for a different task:
summarizer = sa.set_default_prompt_template("summarization")


The default model for PromptModel and PromptNode is google/flan-t5-base but you can use other LLMs that we specified earlier. To do this, specify the model's name and the API key.

Using OpenAI Models

You can replace the default model with a flan t5 model of a different size or a model by OpenAI.
This example uses a version of the GPT-3 model:

from haystack.nodes import PromptModel, PromptNode

openai_api_key = <type your OpenAI API key>

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="text-davinci-003", api_key=openai_api_key)

# Make PromptNode use the model:
pn_open_ai = PromptNode(prompt_open_ai)

pn_open_ai("What's the coolest city to live in Germany?")

Using ChatGPT and GPT-4

You can also use the gpt-3.5-turbo, gpt-4 and gpt-4-32k models from OpenAI to build your own chat functionality. The API for this model includes three types of role: system, assistant, and user. To use Chat GPT, you simply initialize the PromptNode with the gpt-3.5-turbo model:

from haystack.nodes import PromptNode

openai_api_key = <type your OpenAI API key>

# Specify "gpt-3.5-turbo" as the model for PromptNode
prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo", api_key=openai_api_key)

Here's an example of how you can build a chat function that makes use of each role and keep track of the chat flow:

messages = [{"role": "system", "content": "You are a helpful assistant"}]

def build_chat(user_input: str = "", asistant_input: str = ""):
  if user_input != "":
    messages.append({"role": "user", "content": user_input})
  if asistant_input != "":
    messages.append({"role": "assistant", "content": asistant_input})

def chat(input: str):
  chat_gpt_answer = prompt_node(messages)
  return chat_gpt_answer

Now you can use your chat() function:

chat("Who is Barack Obama Married to?")
chat("And what year was she born?")

Using Azure OpenAI Service

In addition to working with APIs directly from OpenAI, you can use PromptModel with Azure OpenAI APIs. For available models and versions for the service, check Azure documentation.

from haystack.nodes import PromptModel

prompt_azure_open_ai = PromptModel(
        "api_version": "2022-12-01",
        "azure_deployment_name": "<your-deployment-name>",

pn_azure_open_ai = PromptNode(prompt_azure_open_ai)

Using ChatGPT on Azure

You can use ChatGPT API on Azure. Here's an example of how you could do that:

api_key = os.environ.get("AZURE_API_KEY")
deployment_name = os.environ.get("AZURE_DEPLOYMENT_NAME")
base_url = os.environ.get("AZURE_BASE_URL")

azure_chat = PromptModel(
        "azure_deployment_name": deployment_name,
        "azure_base_url": base_url,

There are two parameters that you pass as model_kwargs:

  • azure_deployment_name - the name of your Azure deployment.
  • azure_base_url - the URL of the Azure OpenAI endpoint.

Using Hugging Face Models

You can specify parameters for Hugging Face models using model_kwargs. Check out all the available parameters in Hugging Face documentation.

Here's an example of how to set temperature of the model:

from transformers import GenerationConfig

# Using a dictionary
node = PromptNode(model_kwargs={"generation_kwargs": {"do_sample": True, "temperature": 0.6}})

# Using a GenerationConfig object from HuggingFace
node = PromptNode(model_kwargs={"generation_kwargs": GenerationConfig(do_sample=True, top_p=0.9, temperature=0.6)})

Using Hugging Face Inference API

To see the models that can be used with Hugging Face Inference API, use this command:

curl -s https://api-inference.huggingface.co/framework/text-generation-inference

To use the selected model, simply define a PromptNode with Hugging Face token as an api_key and selected model in model_name_or_path.

Using Different Models in One Pipeline

You can also specify different LLMs for each PromptNode in your pipeline. This way, you create multiple PromptNode instances that use a single PromptNode, which saves computational resources.

from haystack.nodes. import PromptTemplate, PromptNode, PromptModel
from haystack.pipelines import Pipeline

# This is to set up the OpenAI model:
from getpass import getpass

api_key_prompt = "Enter OpenAI API key:" 
api_key = getpass(api_key_prompt)

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="text-davinci-003", api_key=api_key)

# This sets up the default model:
prompt_model = PromptModel()

# Now let make one PromptNode use the default model and the other one the OpenAI model:
node_default_model = PromptNode(prompt_model, default_prompt_template="question-generation", output_variable="questions")
node_openai = PromptNode(prompt_open_ai, default_prompt_template="question-answering")

pipeline = Pipeline()
pipeline.add_node(component=node_default_model, name="prompt_node1", inputs=["Query"])
pipe.add_node(component=node_openai, name="prompt_node_2", inputs=["prompt_node1"])
output = pipe.run(query="not relevant", documents=[Document("Berlin is the capital of Germany")])

Migrating from Previous PromptNode Versions

If you created PromptTemplates for previous PromptNode versions and you want to migrate to 1.15, here's what you need to do:

  1. Convert any $var into {var}.
  2. Convert any Shaper you used to modify PromptNode input (Shapers used before PromptNodes) into an f-string function:
    1. For Shapers with the join_documents function, use the join function.
    2. For Shapers with the value_to_list function, remove the Shapers altogether.
  3. Convert Shapers used to modify PromptNode output (Shapers after PromptNode) into output_parser. For any Shapers with the strings_to_answers function, use output_parser=AnswerParser.