DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord


PromptNode brings you the power of large language models. It's an easy-to-use, customizable node that you can run on its own or in your pipelines for various NLP tasks.

With PromptNode, you can use large language models directly or in your pipelines.

What are large language models?

Large language models are huge models trained on enormous amounts of data. Interacting with such a model resembles talking to another person. These models have general knowledge of the world. You can ask them anything, and they'll be able to answer.

Large language models are trained to perform many NLP tasks with little training data. What's astonishing about them is that a single model can perform various NLP tasks with good accuracy.

Some examples of large language models include flan-t5-base, flan-paLM, chinchilla, and GPT-3 variants, such as gpt-3.5-turbo-instruct.

PromptNode is a very versatile node. It's used in query pipelines, but its position depends on what you want it to do. You can pass a template to specify the NLP task the PromptNode should perform and a model to use. For more information, see the Usage section.

Position in a PipelineUsed in query pipelines. The position depends on the NLP task you want it to do.
InputDepends on the NLP task it performs. Some examples are query, documents, output of the preceding node.
OutputDepends on the NLP task it performs. Some examples are answer, query, document summary.
Supported models- flan t5 base (default)
- Hugging Face transformers (all text2text-generation models)
- OpenAI InstructGPT models, including ChatGPT and GPT-4
- Azure OpenAI InstructGPT models
- Cohere's command models
- Anthropic's Claude models
- Open source models hosted on Amazon SageMaker
- Models on Amazon Bedrock


You can use PromptNode as a stand-alone node or in a pipeline. If you don't specify the model you want to use for the node, it uses flan t5 base.


Maximum Token Limits

Each LLM has an overall maximum token limit that it can process. This limit includes both the prompt (input) and the response (output).

The max_length parameter in PromptNode only sets the maximum number of tokens for the generated text output. Therefore, take the potential token length of the prompt into account when setting the parameter. The token length of the prompt plus the specified max_length together must not be larger than the overall number of tokens the LLM can process.

For example, OpenAI's gpt-3.5-turbo-instruct overall limit is 4097 tokens and for gpt-4-32k the limit is 32768 tokens.

Stand Alone

Just initialize the node and ask a question. The model has general knowledge about the world, so you can ask it anything:

from haystack.nodes import PromptNode

# Initialize the node:
prompt_node = PromptNode()

# Run a prompt
prompt_node("What is the capital of Germany?")

# Here's the output:

With a Template

PromptNode can use a PromptTemplate that contains the prompt for the model.

For better results, use a task-specific PromptTemplate. You can pass additional variables like documents or questions to the node. The template combines all inputs into one or more prompts:

from haystack.nodes import PromptNode, PromptTemplate
from haystack import Document

# Initalize the node
prompt_node = PromptNode()

# Specify the template using the `prompt` method 
# and pass your documents and questions:
          documents=[Document("Berlin is the capital of Germany."), Document("Paris is the capital of France.")],
          query="What is the capital of Germany?")

# Here's the output:
[<Answer {'answer': 'Berlin', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_ids': ['1a7644ef76698b7a1c6ed23c357fa598', 'f225a94f83349e8776d6fb89ebfb41b8'], 'meta': {'prompt': 'Given the context please answer the question. Context: Berlin is the capital of Germany. Paris is the capital of France.; Question: What is the capital of Germany?; Answer:'}}>]

You can also create your own templates and pass variables and functions to them. For more information, see the Prompt Templates section.

With a Model Specified

By default, the PromptNode uses the flan t5 base model. But you can change it to any of these models:

  • Hugging Face transformers (all text and text2text-generation models)
  • OpenAI InstructGPT models, including ChatGPT and GPT-4
  • Azure OpenAI InstructGPT models
  • Cohere Command and Generation models
  • Anthropic Claude

Here's how you set the model:

from haystack.nodes import PromptNode

# Initalize the node passing the model:
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl")

# Go ahead and ask a question:
prompt_node("What is the best city in Europe to live in?")

With Streaming

You can enable streaming in PromptNode. Streaming will output LLM responses word by word rather than waiting for the entire response to be generated before outputting everything at once.

To enable streaming, you can specify one or both of the following parameters:

  • stream is a boolean switch that simply enables streaming.
  • stream_handler needs to be a subclass of TokenStreamingHandler.

Both parameters are specified in the PromptNode init constructor. You can override them per PromptNode request by providing them as kwargs.

Here's how to quickly enable streaming:

from haystack.nodes.prompt import PromptNode

pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream":True})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use numbered list"

The default streaming is used when stream is on, and stream_handler is not specified.
When you provide the stream_handler, streaming is enabled. You can register your custom handler to output responses within a custom execution.

Here's how you register a custom handler:

from haystack.nodes.prompt import PromptNode
from haystack.nodes.prompt.invocation_layer.handlers import TokenStreamingHandler

class MyCustomTokenStreamingHandler(TokenStreamingHandler):
    def __call__(self, token_received, **kwargs) -> str:
        # here is your custom logic for each token
        return token_received

custom_handler = MyCustomTokenStreamingHandler()
pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream_handler": custom_handler})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use a numbered list."

You can use your implementation of the TokenStreamingHandler for all invocation layers that support streaming. For example, if you switch from OpenAI to Hugging Face transformers, you can use the same custom TokenStreamingHandler for both.

In a Pipeline

The real power of PromptNode shows when you use it in a pipeline. Look at the example to get an idea of what's possible.


Long-Form Question Answering

Long-form QA is one use of the PromptNode, but certainly not the only one. In this QA type, PromptNode handles complex questions by synthesizing information from various documents to retrieve an answer.

from haystack.pipelines import Pipeline
from haystack.nodes import  PromptNode, PromptTemplate
from haystack.schema import Document

# Let's create a custom LFQA prompt using PromptTemplate
lfqa_prompt = PromptTemplate(prompt="""Synthesize a comprehensive answer from the following topk most relevant paragraphs and the given question. 
                             Provide a clear and concise response that summarizes the key points and information presented in the paragraphs. 
                             Your answer should be in your own words and be no longer than 50 words. 
                             \n\n Paragraphs: {join(documents)} \n\n Question: {query} \n\n Answer:""",

# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."

# Let's initiate the PromptNode 
node = PromptNode("gpt-3.5-turbo-instruct", default_prompt_template=lfqa_prompt, api_key=api_key)

pipe = Pipeline()
pipe.add_node(component=node, name="prompt_node", inputs=["Query"])

output ="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
[a.answer for a in output["answers"]]

# Here's the answer:
["Contrails are manmade clouds formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, creating a visible trail. Increased air traffic has been linked to the greater frequency and amount of these cirrus clouds in Earth's atmosphere."]

Multiple PromptNodes and a Retriever

You can have multiple PromptNodes in your pipeline that reuse the PromptModel to save resources:

from haystack import Pipeline
from haystack.nodes.prompt import PromptNode, PromptModel
# You'd also need to import a Retriever and a DocumentStore, 
# we're skipping this in this example

top_k = 10
query = "Who is Paul Atreides' father?"

prompt_model = PromptModel()
node = PromptNode(prompt_model, default_prompt_template="deepset/question-generation", output_variable="query")
node2 = PromptNode(prompt_model, default_prompt_template="deepset/question-answering-per-document")

# You'd also need to initialize a Retriever with a DocumentStore
# We're skipping this step in this example to simplify it
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=node, name="prompt_node", inputs=["retriever"])
pipe.add_node(component=node2, name="prompt_node_2", inputs=["prompt_node"])
output =, params={"retriever": {"top_k": 10}})

dict(zip(output["query"], output["answers"]))

Setting Request Timeout

There are cases where you might want to change the request timeout for the remote APIs. You can easily do this by setting the environment variable HAYSTACK_REMOTE_API_TIMEOUT_SEC.

The default timeout is 30 seconds.



PromptNode comes with out-of-the-box prompt templates, which are stored on Haystack PromptHub, ready for you to use. A prompt corresponds to an NLP task and contains instructions for the model.

To use a template from the PromptHub, all you need is to pass the name of the prompt to thePromptTemplate parameter.

Here is an example with a deepset/topic-classification prompt:

import os

from haystack.nodes import PromptNode, PromptTemplate

topic_classifier_template = PromptTemplate("deepset/topic-classification")
prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo-instruct", api_key=os.environ.get("OPENAI_API_KEY"))
prompt_node.prompt(prompt_template=topic_classifier_template, documents="YOUR_DOCUMENTS", options=["A LIST OF TOPICS"])

Keep in mind that you need an internet connection to be able to use a prompt from PromptHub for the first time. After the first run, the template is cached locally.

Accessing or changing cache path

This is where you can find your cached prompts according to your operating system:

  • Linux:
  • MacOS:
    /Users/<user>/Library/Application Support/haystack/prompthub_cache/deepset
  • Windows:

You can also find the default cache path on your machine by running the following code:

from haystack.nodes.prompt.prompt_template import PROMPTHUB_CACHE_PATH

You can set the PROMPTHUB_CACHE_PATH environment variable to change the default folder in which the prompts will be saved. If you set a custom PROMPTHUB_CACHE_PATH environment variable, remember to set it to the same value in your console before running Haystack.

If you want to save a prompt even before you run it, you can easily do so with the Haystack CLI. All you need to do is run the following command to save one or more prompts at once:

haystack prompt fetch [PROMPT_NAME]

Find out more about Haystack CLI in the Haystack GitHub repo.

List of legacy templates

These are the legacy Haystack PromptTemplates that are now replaced by PromptHub templates. We recommend you use PromptHub templates starting from Haystack v1.18.

  • question-answering
    This template joins all documents into one document and passes one prompt to the model, instructing it to perform question answering on the joined document.
  • question-answering-per-document
    This template performs question answering by passing one prompt per document. This may improve the output but is more resource intensive than the question-answering template.
  • question-answering-with-references
    This is the same as question answering, but instructs the model to cite documents by adding references. It joins all documents into one and passes one prompt to the model. The model outputs answers together with references to the documents that contain them.
  • question-answering-with-document-scores
    Performs question answering, taking into account document scores stored in metadata. This is the template used by the PromptNode in the WebQAPipeline.
  • question-generation
    Generates a question based on your documents.
  • conditioned-question-generation
    Based on your documents, generates a question for the answer you provide.
  • summarization
    Summarizes documents.
  • question-answering-check
    Checks if the documents contain the answer to a question.
  • sentiment-analysis
    Analyzes the sentiment of the documents.
  • multiple-choice-question-answering
    From a set of options, chooses the one that best answers the question.
  • topic-classification
    Categorizes documents by their topic.
  • language-detection
    Returns the language of the documents.
  • translation
    Translates documents.

PromptTemplate Structure

Here's an example of a template:

PromptTemplate(prompt="Given the context please answer the question. Context: {join(documents)}; Question: "
            "{query}; Answer:",
  • prompt contains the prompt for the task you want the model to do. It also specifies input variables: document and query. The variables are either primitives or lists of primitives.
    At runtime, these variables must be present in the execution context of the node. You can apply functions to those variables. For example, you can combine the list of documents into a string by applying the join function. By doing this, only one prompt instead of len(documents) prompts is executed.
  • output_parser converts the output of the model to Haystack Document, Answer, or Label object. There's a ready-to-use AnswerParser which converts the output to the Haystack Answer object. Have a look at the API documentation for more information.

Functions in Prompts

You can add functions to your template to control how the documents, the query, or any other variable are rendered. A simplified version of the question-answering template looks like this:

PromptTemplate(prompt="Please answer the question. "
            "Context: {' - '.join([d.meta['name']+': '+d.content for d in documents])}; Question: {query}; Answer: ",

Function Format

The functions use the Python f-string format, so you can use any list comprehensions inside a function:

' '.join([d.meta['name']+': '+d.content for d in documents])

Other than strict f-string syntax, you can safely use the following backslash characters in the text parts of the prompt text: \n, \t, \r. To use them in f-string expressions, pick the corresponding PromptTemplate variable from the table below.
Double quotes (") are automatically replaced with single quotes (') in the prompt text. To use double quotes in the prompt text, use {double_quote} instead.

Special characters not allowed in prompt expressionsPromptTemplate variable to use instead

Some of the ready-made templates also contain functions, for example:

PromptTemplate(prompt="Create a concise and informative answer (no more than 50 words) for a given question "
            "based solely on the given documents. You must only use information from the given documents. "
            "Use an unbiased and journalistic tone. Do not repeat text. Cite the documents using Document[number] notation. "
            "If multiple documents contain the answer, cite those documents like ‘as stated in Document[number], Document[number], etc.’. "
            "If the documents do not contain the answer to the question, say that ‘answering is not possible given the available information.’\n"
            "{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n Question: {query}; Answer: ",

Note that in this example, we're not using the str.join Python function but our own convenience function join.

Here are the functions allowed in PromptTemplates:

joinJoins all documents into a single string, where the content of each document is separated by the delimiter you specify."{join(documents, delimiter=new_line)}
to_stringsExtracts the content field of documents and returns a list of strings."{to_strings(documents)}
replaceReplaces a character.{query.replace('how', 'what').replace('?', '!')}
enumeratePython function that counts and returns the number of objects.You have {enumerate(documents)} documents available to help you answer.
strPython class that converts objects into strings.str(b'Hello!')
current_datetimePrints current date and/or time.Today is the {current_datetime(dd/MM/YY)}.

join and to_strings Functions

Two functions that you may find most useful are join and to_strings.

The join function joins all documents into a single string, where the content of each document is separated by the delimiter you specify.


"{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})}

The to strings function extracts the content field of documents and returns a list of strings. In the example below, it renders each document by its name (document.meta["name"]) followed by a new line and the contents of the document:

"{to_strings(documents, pattern='$name'+new_line+'$content', str_replace={new_line: ' ', '[': '(', ']': ')'})}

Function Parameters

documentsListThe documents whose rendering you want to format. Mandatory.
patternStringThe regex pattern used for parsing. Optional.
You can use the following placeholders in pattern:
- $content: The content of the document.
- $idx: The index of the document in the list.
- $id: The ID of the document.
- $META_FIELD: The values of the metadata field called META_FIELD.
Default: " " (single space)
Specifies the delimiter you want to use to separate documents. Used in the join function. Mandatory.
str_replaceDictionary of stringsSpecifies the characters you want to replace. Use the format str_replace={"r":"R"}. Optional.

Output Parsers


With AnswerParser , you can convert the plain string model output into proper Answer objects. It takes care of populating the Answer's fields like adding the prompt to meta or referencing source document_ids. Using AnswerParser makes PromptNode publish its results in the answers key. This way, you can use PromptNode as plug-in replacements for any answer-returning nodes, such as Reader.

patternStringThe regex pattern to use for parsing the answer.

[^\n]+$ will find "this is an answer" in string "this is an argument.\nthis is an answer".
Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer".

If None, the whole string is used as the answer. If specified, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer.
reference_patternStringThe regex pattern to use for parsing the document references.

\[(\d+)\] will find "1" in string "this is an answer"De".

If None, no parsing is done and all documents are referenced.

Writing Your Prompt

You can easily write your own template:

from haystack.nodes import PromptTemplate, PromptNode

# In `prompt`, tell the model what you want it to do.
PromptNode.add_prompt_template(PromptTemplate(prompt="Indicate the sentiment. Answer with positive, negative, or neutral. Context: {documents}; Answer:"))

For guidelines on how to construct the most efficient prompts, see Prompt Engineering Guidelines.

Prompt File Structure

To save your prompt to a template and be able to use it locally, you need to follow a specific format.

Here is an example of a deepset/conversational-agent YAML file:

name: deepset/conversational-agent
text: |
  The following is a conversation between a human and an AI.\n{history}\nHuman: {query}\nAI:
description: Conversational agent which holds the history of the conversation.
  - agent
  - conversational
    - deepset-ai
version: '0.1.0'

All fields are mandatory to be filled out.

  • name is the title of your template.
  • text is the text of the prompt itself.
  • description is a short explanation of what your prompt does.
  • tags are the labels for your prompt, keywords that would simplify the search.
  • meta field:
    • author can be your name, your GitHub handle, or another identifier.
  • version is a numbered iteration of your prompt.

PromptTemplate Usage Examples

There are four different types of PromptTemplates that you can use:

  1. PromptHub
    prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="deepset/question-answering-per-document")
  2. Your own prompt
    prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="Indicate the sentiment. Answer with positive, negative, or neutral. Context: {documents}; Answer:")
  3. Prompt saved locally
    prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="local_path_to_prompt")
  4. Legacy prompt (not recommended)
    prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="question-answering-per-document")


The default model for PromptModel and PromptNode is google/flan-t5-base but you can use other LLMs that we specified earlier. To do this, specify the model's name and the API key.

Using OpenAI Models

You can replace the default model with a flan t5 model of a different size or a model by OpenAI.
This example uses a version of the GPT-3 model:

from haystack.nodes import PromptModel, PromptNode

openai_api_key = <type your OpenAI API key>

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="gpt-3.5-turbo-instruct", api_key=openai_api_key)

# Make PromptNode use the model:
pn_open_ai = PromptNode(prompt_open_ai)

pn_open_ai("What's the coolest city to live in Germany?")

Using ChatGPT and GPT-4

You can also use the gpt-3.5-turbo, gpt-4 and gpt-4-32k models from OpenAI to build your own chat functionality. The API for this model includes three types of role: system, assistant, and user. To use Chat GPT, you simply initialize the PromptNode with the gpt-3.5-turbo model:

from haystack.nodes import PromptNode

openai_api_key = <type your OpenAI API key>

# Specify "gpt-3.5-turbo" as the model for PromptNode
prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo", api_key=openai_api_key)

Here's an example of how you can build a chat function that makes use of each role and keep track of the chat flow:

messages = [{"role": "system", "content": "You are a helpful assistant"}]

def build_chat(user_input: str = "", asistant_input: str = ""):
  if user_input != "":
    messages.append({"role": "user", "content": user_input})
  if asistant_input != "":
    messages.append({"role": "assistant", "content": asistant_input})

def chat(input: str):
  chat_gpt_answer = prompt_node(messages)
  return chat_gpt_answer

Now you can use your chat() function:

chat("Who is Barack Obama Married to?")
chat("And what year was she born?")

Using Azure OpenAI Service

In addition to working with APIs directly from OpenAI, you can use PromptModel with Azure OpenAI APIs. For available models and versions for the service, check Azure documentation.

from haystack.nodes import PromptModel

prompt_azure_open_ai = PromptModel(
        "api_version": "2022-12-01",
        "azure_deployment_name": "<your-deployment-name>",

pn_azure_open_ai = PromptNode(prompt_azure_open_ai)

Using ChatGPT on Azure

You can use ChatGPT API on Azure. Here's an example of how you could do that:

api_key = os.environ.get("AZURE_API_KEY")
deployment_name = os.environ.get("AZURE_DEPLOYMENT_NAME")
base_url = os.environ.get("AZURE_BASE_URL")

azure_chat = PromptModel(
        "azure_deployment_name": deployment_name,
        "azure_base_url": base_url,

There are two parameters that you pass as model_kwargs:

  • azure_deployment_name - the name of your Azure deployment.
  • azure_base_url - the URL of the Azure OpenAI endpoint.

Using Cohere Generative Models

You can use generative models from Cohere, like Command, with the PromptNode by simply specifying the model name and providing your Cohere token:

from haystack.nodes import PromptNode

pn = PromptNode(model_name_or_path="command", api_key=your_cohere_api_key)

Haystack supports Cohere's command, command-light, base and base-light models.

Using Anthropic Generative Models

Using any of generative models by Anthropic is easy with PromptNode as well. It requires the model name, the maximum length of the output text, and your Anthropic API key. Optionally, you can add Anthropic's relevant keyword arguments as model_kwargs:

from haystack.nodes import PromptNode

pn = PromptNode(model_name_or_path="claude-2", api_key=your_anthropic_api_key, max_length=200, model_kwargs={"stream":True})

Using Hugging Face Models

You can specify parameters for Hugging Face models using model_kwargs. Check out all the available parameters in Hugging Face documentation.

Here's an example of how to set temperature of the model:

from transformers import GenerationConfig

# Using a dictionary
node = PromptNode(model_kwargs={"generation_kwargs": {"do_sample": True, "temperature": 0.6}})

# Using a GenerationConfig object from HuggingFace
node = PromptNode(model_kwargs={"generation_kwargs": GenerationConfig(do_sample=True, top_p=0.9, temperature=0.6)})

Using Hugging Face Inference API

To see the models that can be used with Hugging Face Inference API, use this command:

curl -s

To use the selected model, simply define a PromptNode with Hugging Face token as an api_key and selected model in model_name_or_path.

Using Local Models

To use a local model, initialize your PromptNode using model_kwargs, where you need to pass any additional parameters (such as task_name, tokenizer, etc.) required by specific models.

Here’s an example of how you would initialize your local model with a path:

from haystack.nodes import PromptNode

prompt_node = PromptNode(model_name_or_path=local_path, model_kwargs={'task_name':'text2text-generation'})

Additionally, for local loading, you don't necessarily need to use a cache path. You can load the model using Hugging Face transformers classes/functions and pass the model directly in model_kwargs.

Using the Latest Hugging Face Hub Text Generation Models

There is a simple approach to incorporate newer LLMs with a custom setup into Haystack.

To do that, initialize your PromptNode using the model_kwargs value, where you need to pass any additional parameters (such as task_name, tokenizer, etc.) required by specific models.

This is an example of how you would initialize an MPT-7B-Instruct model, considering the requirements in its model card:

from haystack.nodes import PromptNode
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

prompt_node = PromptNode("mosaicml/mpt-7b-instruct", model_kwargs={"model":model, "tokenizer": tokenizer})

Using Open Source LLMs Hosted with AWS SageMaker

If you need to deploy an open source LLM without hosting it yourself or want to safeguard sensitive data, you might consider hosting it with AWS SageMaker.



Consider using AWS CLI as a more straightforward tool to manage your AWS services. With AWS CLI, you can quickly configure your boto3 credentials. This way, you won't need to provide detailed authentication parameters when initializing PromptNode in Haystack.

To begin, you need to deploy your chosen Hugging Face text generation model to SageMaker. You can do this quickly and easily with the SageMaker Studio JumpStart, where you select a model and click on Deploy. For more information, refer to the AWS documentation.

To initialize PromptNode, provide the inference endpoint name and your aws_profile_name and aws_region_name. Other authentication parameters are optional if you have already configured them with AWS CLI.

Here’s an example of how to initialize PromptNode:

from haystack.nodes import PromptNode

pn = PromptNode(model_name_or_path="sagemaker-model-endpoint-name", model_kwargs={"aws_profile_name": "my_aws_profile_name","aws_region_name": "your-aws-region"})

Keep in mind that streaming is not yet supported by SageMaker endpoints.

List of tested models

Here are the SageMaker-hosted models that we tested with Haystack:

  • Falcon models
  • MPT
  • Dolly V2
  • Flan-U2
  • Flan-T5
  • RedPajama
  • Open Llama
  • GPT-J-6B
  • BloomZ

Using Models on Amazon Bedrock

Amazon Bedrock is a fully managed service that makes high-performing foundation models from leading AI startups (AI21 Labs, Anthropic, Cohere, Meta, and Amazon available for your use through a unified API. You can choose from a wide range of foundation models to find the one that is best suited for your use case.

To initialize PromptNode, provide the model name, as well as aws_access_key_id, aws_secret_access_key and aws_region_name as model_kwargs. Other parameters are optional.

Here’s an example of how to initialize PromptNode with Amazon Bedrock models:

from haystack.nodes import PromptNode

prompt_node = PromptNode(model_name_or_path="anthropic.claude-v2",

Using Different Models in One Pipeline

You can also specify different LLMs for each PromptNode in your pipeline. This way, you create multiple PromptNode instances that use a single PromptNode, which saves computational resources.

from haystack.nodes. import PromptTemplate, PromptNode, PromptModel
from haystack.pipelines import Pipeline

# This is to set up the OpenAI model:
from getpass import getpass

api_key_prompt = "Enter OpenAI API key:" 
api_key = getpass(api_key_prompt)

# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="gpt-3.5-turbo-instruct", api_key=api_key)

# This sets up the default model:
prompt_model = PromptModel()

# Now let make one PromptNode use the default model and the other one the OpenAI model:
node_default_model = PromptNode(prompt_model, default_prompt_template="deepset/question-generation", output_variable="questions")
node_openai = PromptNode(prompt_open_ai, default_prompt_template="deepset/question-answering")

pipeline = Pipeline()
pipeline.add_node(component=node_default_model, name="prompt_node1", inputs=["Query"])
pipe.add_node(component=node_openai, name="prompt_node_2", inputs=["prompt_node1"])
output ="not relevant", documents=[Document("Berlin is the capital of Germany")])