PromptNode
PromptNode brings you the power of large language models. It's an easy-to-use, customizable node that you can run on its own or in your pipelines for various NLP tasks.
With PromptNode, you can use large language models directly or in your pipelines.
What are large language models?
Large language models are huge models trained on enormous amounts of data. Interacting with such a model resembles talking to another person. These models have general knowledge of the world. You can ask them anything, and they'll be able to answer.
Large language models are trained to perform many NLP tasks with little training data. What's astonishing about them is that a single model can perform various NLP tasks with good accuracy.
Some examples of large language models include flan-t5-base, flan-paLM, chinchilla, and GPT-3 variants, such as gpt-3.5-turbo-instruct.
PromptNode is a very versatile node. It's used in query pipelines, but its position depends on what you want it to do. You can pass a template to specify the NLP task the PromptNode should perform and a model to use. For more information, see the Usage section.
Position in a Pipeline | Used in query pipelines. The position depends on the NLP task you want it to do. |
Input | Depends on the NLP task it performs. Some examples are query, documents, output of the preceding node. |
Output | Depends on the NLP task it performs. Some examples are answer, query, document summary. |
Classes | PromptNode |
Supported models | - flan t5 base (default) - Hugging Face transformers (all text2text-generation models) - OpenAI InstructGPT models, including ChatGPT and GPT-4 - Azure OpenAI InstructGPT models - Cohere's command models - Anthropic's Claude models - Open source models hosted on Amazon SageMaker - Models on Amazon Bedrock |
Usage
You can use PromptNode as a stand-alone node or in a pipeline. If you don't specify the model you want to use for the node, it uses flan t5 base.
Maximum Token Limits
Each LLM has an overall maximum token limit that it can process. This limit includes both the prompt (input) and the response (output).
The
max_length
parameter in PromptNode only sets the maximum number of tokens for the generated text output. Therefore, take the potential token length of the prompt into account when setting the parameter. The token length of the prompt plus the specifiedmax_length
together must not be larger than the overall number of tokens the LLM can process.For example, OpenAI's
gpt-3.5-turbo-instruct
overall limit is 4097 tokens and forgpt-4-32k
the limit is 32768 tokens.
Stand Alone
Just initialize the node and ask a question. The model has general knowledge about the world, so you can ask it anything:
from haystack.nodes import PromptNode
# Initialize the node:
prompt_node = PromptNode()
# Run a prompt
prompt_node("What is the capital of Germany?")
# Here's the output:
['berlin']
With a Template
PromptNode can use a PromptTemplate that contains the prompt for the model.
For better results, use a task-specific PromptTemplate. You can pass additional variables like documents or questions to the node. The template combines all inputs into one or more prompts:
from haystack.nodes import PromptNode, PromptTemplate
from haystack import Document
# Initalize the node
prompt_node = PromptNode()
# Specify the template using the `prompt` method
# and pass your documents and questions:
prompt_node.prompt(prompt_template="deepset/question-answering",
documents=[Document("Berlin is the capital of Germany."), Document("Paris is the capital of France.")],
query="What is the capital of Germany?")
# Here's the output:
[<Answer {'answer': 'Berlin', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_ids': ['1a7644ef76698b7a1c6ed23c357fa598', 'f225a94f83349e8776d6fb89ebfb41b8'], 'meta': {'prompt': 'Given the context please answer the question. Context: Berlin is the capital of Germany. Paris is the capital of France.; Question: What is the capital of Germany?; Answer:'}}>]
You can also create your own templates and pass variables and functions to them. For more information, see the Prompt Templates section.
With a Model Specified
By default, the PromptNode uses the flan t5 base model. But you can change it to any of these models:
- Hugging Face transformers (all text and text2text-generation models)
- OpenAI InstructGPT models, including ChatGPT and GPT-4
- Azure OpenAI InstructGPT models
- Cohere Command and Generation models
- Anthropic Claude
Here's how you set the model:
from haystack.nodes import PromptNode
# Initalize the node passing the model:
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl")
# Go ahead and ask a question:
prompt_node("What is the best city in Europe to live in?")
With Streaming
You can enable streaming in PromptNode. Streaming will output LLM responses word by word rather than waiting for the entire response to be generated before outputting everything at once.
To enable streaming, you can specify one or both of the following parameters:
stream
is a boolean switch that simply enables streaming.stream_handler
needs to be a subclass ofTokenStreamingHandler
.
Both parameters are specified in the PromptNode init constructor. You can override them per PromptNode request by providing them as kwargs
.
Here's how to quickly enable streaming:
from haystack.nodes.prompt import PromptNode
pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream":True})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use numbered list"
pn(prompt)
The default streaming is used when stream
is on, and stream_handler
is not specified.
When you provide the stream_handler
, streaming is enabled. You can register your custom handler to output responses within a custom execution.
Here's how you register a custom handler:
from haystack.nodes.prompt import PromptNode
from haystack.nodes.prompt.invocation_layer.handlers import TokenStreamingHandler
class MyCustomTokenStreamingHandler(TokenStreamingHandler):
def __call__(self, token_received, **kwargs) -> str:
# here is your custom logic for each token
return token_received
custom_handler = MyCustomTokenStreamingHandler()
pn = PromptNode("gpt-3.5-turbo", api_key="<api_key_goes_here>", model_kwargs={"stream_handler": custom_handler})
prompt = "What are the three most interesting things about Berlin? Be elaborate and use a numbered list."
pn(prompt)
You can use your implementation of the TokenStreamingHandler
for all invocation layers that support streaming. For example, if you switch from OpenAI to Hugging Face transformers, you can use the same custom TokenStreamingHandler
for both.
In a Pipeline
The real power of PromptNode shows when you use it in a pipeline. Look at the example to get an idea of what's possible.
Examples
Long-Form Question Answering
Long-form QA is one use of the PromptNode, but certainly not the only one. In this QA type, PromptNode handles complex questions by synthesizing information from various documents to retrieve an answer.
from haystack.pipelines import Pipeline
from haystack.nodes import PromptNode, PromptTemplate
from haystack.schema import Document
# Let's create a custom LFQA prompt using PromptTemplate
lfqa_prompt = PromptTemplate(prompt="""Synthesize a comprehensive answer from the following topk most relevant paragraphs and the given question.
Provide a clear and concise response that summarizes the key points and information presented in the paragraphs.
Your answer should be in your own words and be no longer than 50 words.
\n\n Paragraphs: {join(documents)} \n\n Question: {query} \n\n Answer:""",
output_parser=AnswerParser(),)
# These docs could also come from a retriever
# Here we explicitly specify them to avoid the setup steps for Retriever and DocumentStore
doc_1 = "Contrails are a manmade type of cirrus cloud formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, leaving behind a visible trail. The exhaust can also trigger the formation of cirrus by providing ice nuclei when there is an insufficient naturally-occurring supply in the atmosphere. One of the environmental impacts of aviation is that persistent contrails can form into large mats of cirrus, and increased air traffic has been implicated as one possible cause of the increasing frequency and amount of cirrus in Earth's atmosphere."
doc_2 = "Because the aviation industry is especially sensitive to the weather, accurate weather forecasting is essential. Fog or exceptionally low ceilings can prevent many aircraft from landing and taking off. Turbulence and icing are also significant in-flight hazards. Thunderstorms are a problem for all aircraft because of severe turbulence due to their updrafts and outflow boundaries, icing due to the heavy precipitation, as well as large hail, strong winds, and lightning, all of which can cause severe damage to an aircraft in flight. Volcanic ash is also a significant problem for aviation, as aircraft can lose engine power within ash clouds. On a day-to-day basis airliners are routed to take advantage of the jet stream tailwind to improve fuel efficiency. Aircrews are briefed prior to takeoff on the conditions to expect en route and at their destination. Additionally, airports often change which runway is being used to take advantage of a headwind. This reduces the distance required for takeoff, and eliminates potential crosswinds."
# Let's initiate the PromptNode
node = PromptNode("gpt-3.5-turbo-instruct", default_prompt_template=lfqa_prompt, api_key=api_key)
pipe = Pipeline()
pipe.add_node(component=node, name="prompt_node", inputs=["Query"])
output = pipe.run(query="Why do airplanes leave contrails in the sky?", documents=[Document(doc_1), Document(doc_2)])
[a.answer for a in output["answers"]]
# Here's the answer:
["Contrails are manmade clouds formed when water vapor from the exhaust of a jet engine condenses on particles, which come from either the surrounding air or the exhaust itself, and freezes, creating a visible trail. Increased air traffic has been linked to the greater frequency and amount of these cirrus clouds in Earth's atmosphere."]
Multiple PromptNodes and a Retriever
You can have multiple PromptNodes in your pipeline that reuse the PromptModel to save resources:
from haystack import Pipeline
from haystack.nodes.prompt import PromptNode, PromptModel
# You'd also need to import a Retriever and a DocumentStore,
# we're skipping this in this example
top_k = 10
query = "Who is Paul Atreides' father?"
prompt_model = PromptModel()
node = PromptNode(prompt_model, default_prompt_template="deepset/question-generation", output_variable="query")
node2 = PromptNode(prompt_model, default_prompt_template="deepset/question-answering-per-document")
# You'd also need to initialize a Retriever with a DocumentStore
# We're skipping this step in this example to simplify it
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=node, name="prompt_node", inputs=["retriever"])
pipe.add_node(component=node2, name="prompt_node_2", inputs=["prompt_node"])
output = pipe.run(query=query, params={"retriever": {"top_k": 10}})
dict(zip(output["query"], output["answers"]))
Setting Request Timeout
There are cases where you might want to change the request timeout for the remote APIs. You can easily do this by setting the environment variable HAYSTACK_REMOTE_API_TIMEOUT_SEC
.
The default timeout is 30 seconds.
PromptTemplates
PromptHub
PromptNode comes with out-of-the-box prompt templates, which are stored on Haystack PromptHub, ready for you to use. A prompt corresponds to an NLP task and contains instructions for the model.
To use a template from the PromptHub, all you need is to pass the name of the prompt to thePromptTemplate
parameter.
Here is an example with a deepset/topic-classification prompt:
import os
from haystack.nodes import PromptNode, PromptTemplate
topic_classifier_template = PromptTemplate("deepset/topic-classification")
prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo-instruct", api_key=os.environ.get("OPENAI_API_KEY"))
prompt_node.prompt(prompt_template=topic_classifier_template, documents="YOUR_DOCUMENTS", options=["A LIST OF TOPICS"])
Keep in mind that you need an internet connection to be able to use a prompt from PromptHub for the first time. After the first run, the template is cached locally.
Accessing or changing cache path
This is where you can find your cached prompts according to your operating system:
- Linux:
/home/<user>/.local/share/haystack/prompthub_cache/deepset
- MacOS:
/Users/<user>/Library/Application Support/haystack/prompthub_cache/deepset
- Windows:
C:\Users\<user>\AppData\Local\Acme\SuperApp\haystack\prompthub_cache\deepset
You can also find the default cache path on your machine by running the following code:
from haystack.nodes.prompt.prompt_template import PROMPTHUB_CACHE_PATH
print(PROMPTHUB_CACHE_PATH)
You can set the PROMPTHUB_CACHE_PATH environment variable to change the default folder in which the prompts will be saved. If you set a custom PROMPTHUB_CACHE_PATH environment variable, remember to set it to the same value in your console before running Haystack.
If you want to save a prompt even before you run it, you can easily do so with the Haystack CLI. All you need to do is run the following command to save one or more prompts at once:
haystack prompt fetch [PROMPT_NAME]
Find out more about Haystack CLI in the Haystack GitHub repo.
List of legacy templates
These are the legacy Haystack PromptTemplates that are now replaced by PromptHub templates. We recommend you use PromptHub templates starting from Haystack v1.18.
question-answering
This template joins all documents into one document and passes one prompt to the model, instructing it to perform question answering on the joined document.question-answering-per-document
This template performs question answering by passing one prompt per document. This may improve the output but is more resource intensive than thequestion-answering
template.question-answering-with-references
This is the same asquestion answering
, but instructs the model to cite documents by adding references. It joins all documents into one and passes one prompt to the model. The model outputs answers together with references to the documents that contain them.question-answering-with-document-scores
Performs question answering, taking into account document scores stored in metadata. This is the template used by the PromptNode in the WebQAPipeline.question-generation
Generates a question based on your documents.conditioned-question-generation
Based on your documents, generates a question for the answer you provide.summarization
Summarizes documents.question-answering-check
Checks if the documents contain the answer to a question.sentiment-analysis
Analyzes the sentiment of the documents.multiple-choice-question-answering
From a set of options, chooses the one that best answers the question.topic-classification
Categorizes documents by their topic.language-detection
Returns the language of the documents.translation
Translates documents.
PromptTemplate Structure
Here's an example of a template:
PromptTemplate(prompt="Given the context please answer the question. Context: {join(documents)}; Question: "
"{query}; Answer:",
output_parser=AnswerParser(),
),
prompt
contains the prompt for the task you want the model to do. It also specifies input variables:document
andquery
. The variables are either primitives or lists of primitives.
At runtime, these variables must be present in the execution context of the node. You can apply functions to those variables. For example, you can combine the list of documents into a string by applying thejoin
function. By doing this, only one prompt instead oflen(documents)
prompts is executed.output_parser
converts the output of the model to HaystackDocument
,Answer
, orLabel
object. There's a ready-to-useAnswerParser
which converts the output to the HaystackAnswer
object. Have a look at the API documentation for more information.
Functions in Prompts
You can add functions to your template to control how the documents, the query, or any other variable are rendered. A simplified version of the question-answering
template looks like this:
PromptTemplate(prompt="Please answer the question. "
"Context: {' - '.join([d.meta['name']+': '+d.content for d in documents])}; Question: {query}; Answer: ",
output_parser=AnswerParser(),
),
Function Format
The functions use the Python f-string format, so you can use any list comprehensions inside a function:
' '.join([d.meta['name']+': '+d.content for d in documents])
Other than strict f-string syntax, you can safely use the following backslash characters in the text parts of the prompt text: \n
, \t
, \r
. To use them in f-string expressions, pick the corresponding PromptTemplate variable from the table below.
Double quotes ("
) are automatically replaced with single quotes ('
) in the prompt text. To use double quotes in the prompt text, use {double_quote}
instead.
Special characters not allowed in prompt expressions | PromptTemplate variable to use instead |
---|---|
\n | new_line |
\t | tab |
\r | carriage_return |
" | double_quote |
Some of the ready-made templates also contain functions, for example:
PromptTemplate(prompt="Create a concise and informative answer (no more than 50 words) for a given question "
"based solely on the given documents. You must only use information from the given documents. "
"Use an unbiased and journalistic tone. Do not repeat text. Cite the documents using Document[number] notation. "
"If multiple documents contain the answer, cite those documents like ‘as stated in Document[number], Document[number], etc.’. "
"If the documents do not contain the answer to the question, say that ‘answering is not possible given the available information.’\n"
"{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})} \n Question: {query}; Answer: ",
output_parser=AnswerParser(reference_pattern=r"Document\[(\d+)\]"),
),
Note that in this example, we're not using the str.join
Python function but our own convenience function join
.
Here are the functions allowed in PromptTemplates:
Function | Description | Example |
---|---|---|
join | Joins all documents into a single string, where the content of each document is separated by the delimiter you specify. | "{join(documents, delimiter=new_line)} |
to_strings | Extracts the content field of documents and returns a list of strings. | "{to_strings(documents)} |
replace | Replaces a character. | {query.replace('how', 'what').replace('?', '!')} |
enumerate | Python function that counts and returns the number of objects. | You have {enumerate(documents)} documents available to help you answer. |
str | Python class that converts objects into strings. | str(b'Hello!') |
current_datetime | Prints current date and/or time. | Today is the {current_datetime(dd/MM/YY)}. |
join
and to_strings
Functions
join
and to_strings
FunctionsTwo functions that you may find most useful are join
and to_strings
.
The join
function joins all documents into a single string, where the content of each document is separated by the delimiter you specify.
Example:
"{join(documents, delimiter=new_line, pattern=new_line+'Document[$idx]: $content', str_replace={new_line: ' ', '[': '(', ']': ')'})}
The to strings
function extracts the content field of documents and returns a list of strings. In the example below, it renders each document by its name (document.meta["name"]
) followed by a new line and the contents of the document:
"{to_strings(documents, pattern='$name'+new_line+'$content', str_replace={new_line: ' ', '[': '(', ']': ')'})}
Function Parameters
Parameter | Type | Description |
---|---|---|
documents | List | The documents whose rendering you want to format. Mandatory. |
pattern | String | The regex pattern used for parsing. Optional. You can use the following placeholders in pattern: - $content : The content of the document.- $idx : The index of the document in the list.- $id : The ID of the document.- $META_FIELD : The values of the metadata field called META_FIELD . |
delimiter | String Default: " " (single space) | Specifies the delimiter you want to use to separate documents. Used in the join function. Mandatory. |
str_replace | Dictionary of strings | Specifies the characters you want to replace. Use the format str_replace={"r":"R"} . Optional. |
Output Parsers
AnswerParser
With AnswerParser
, you can convert the plain string model output into proper Answer
objects. It takes care of populating the Answer
's fields like adding the prompt to meta or referencing source document_ids
. Using AnswerParser
makes PromptNode
publish its results in the answers
key. This way, you can use PromptNode
as plug-in replacements for any answer-returning nodes, such as Reader
.
Parameter | Type | Description |
---|---|---|
pattern | String | The regex pattern to use for parsing the answer. Examples: [^\n]+$ will find "this is an answer" in string "this is an argument.\nthis is an answer".Answer: (.*) will find "this is an answer" in string "this is an argument. Answer: this is an answer".If None, the whole string is used as the answer. If specified, the first group of the regex is used as the answer. If there is no group, the whole match is used as the answer. |
reference_pattern | String | The regex pattern to use for parsing the document references. Example: \[(\d+)\] will find "1" in string "this is an answer"De".If None, no parsing is done and all documents are referenced. |
Writing Your Prompt
You can easily write your own template:
from haystack.nodes import PromptTemplate, PromptNode
# In `prompt`, tell the model what you want it to do.
PromptNode.add_prompt_template(PromptTemplate(prompt="Indicate the sentiment. Answer with positive, negative, or neutral. Context: {documents}; Answer:"))
For guidelines on how to construct the most efficient prompts, see Prompt Engineering Guidelines.
Prompt File Structure
To save your prompt to a template and be able to use it locally, you need to follow a specific format.
Here is an example of a deepset/conversational-agent
YAML file:
name: deepset/conversational-agent
text: |
The following is a conversation between a human and an AI.\n{history}\nHuman: {query}\nAI:
description: Conversational agent which holds the history of the conversation.
tags:
- agent
- conversational
meta:
authors:
- deepset-ai
version: '0.1.0'
All fields are mandatory to be filled out.
name
is the title of your template.text
is the text of the prompt itself.description
is a short explanation of what your prompt does.tags
are the labels for your prompt, keywords that would simplify the search.meta
field:author
can be your name, your GitHub handle, or another identifier.
version
is a numbered iteration of your prompt.
PromptTemplate Usage Examples
There are four different types of PromptTemplates that you can use:
- PromptHub
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="deepset/question-answering-per-document")
- Your own prompt
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="Indicate the sentiment. Answer with positive, negative, or neutral. Context: {documents}; Answer:")
- Prompt saved locally
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="local_path_to_prompt")
- Legacy prompt (not recommended)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-xl", default_prompt_template="question-answering-per-document")
Models
The default model for PromptModel and PromptNode is google/flan-t5-base
but you can use other LLMs that we specified earlier. To do this, specify the model's name and the API key.
Using OpenAI Models
You can replace the default model with a flan t5 model of a different size or a model by OpenAI.
This example uses a version of the GPT-3 model:
from haystack.nodes import PromptModel, PromptNode
openai_api_key = <type your OpenAI API key>
# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="gpt-3.5-turbo-instruct", api_key=openai_api_key)
# Make PromptNode use the model:
pn_open_ai = PromptNode(prompt_open_ai)
pn_open_ai("What's the coolest city to live in Germany?")
Using ChatGPT and GPT-4
You can also use the gpt-3.5-turbo
, gpt-4
and gpt-4-32k
models from OpenAI to build your own chat functionality. The API for this model includes three types of role
: system
, assistant
, and user
. To use Chat GPT, you simply initialize the PromptNode
with the gpt-3.5-turbo
model:
from haystack.nodes import PromptNode
openai_api_key = <type your OpenAI API key>
# Specify "gpt-3.5-turbo" as the model for PromptNode
prompt_node = PromptNode(model_name_or_path="gpt-3.5-turbo", api_key=openai_api_key)
Here's an example of how you can build a chat function that makes use of each role
and keep track of the chat flow:
messages = [{"role": "system", "content": "You are a helpful assistant"}]
def build_chat(user_input: str = "", asistant_input: str = ""):
if user_input != "":
messages.append({"role": "user", "content": user_input})
if asistant_input != "":
messages.append({"role": "assistant", "content": asistant_input})
def chat(input: str):
build_chat(user_input=input)
chat_gpt_answer = prompt_node(messages)
build_chat(asistant_input=chat_gpt_answer[0])
return chat_gpt_answer
Now you can use your chat()
function:
chat("Who is Barack Obama Married to?")
chat("And what year was she born?")
Using Azure OpenAI Service
In addition to working with APIs directly from OpenAI, you can use PromptModel
with Azure OpenAI APIs. For available models and versions for the service, check Azure documentation.
from haystack.nodes import PromptModel
prompt_azure_open_ai = PromptModel(
model_name_or_path="gpt-3.5-turbo-instruct",
api_key="<your-azure-openai-key>",
model_kwargs={
"api_version": "2022-12-01",
"azure_base_url":"https://<your-endpoint>.openai.azure.com",
"azure_deployment_name": "<your-deployment-name>",
}
)
pn_azure_open_ai = PromptNode(prompt_azure_open_ai)
Using ChatGPT on Azure
You can use ChatGPT API on Azure. Here's an example of how you could do that:
api_key = os.environ.get("AZURE_API_KEY")
deployment_name = os.environ.get("AZURE_DEPLOYMENT_NAME")
base_url = os.environ.get("AZURE_BASE_URL")
azure_chat = PromptModel(
model_name_or_path="gpt-35-turbo",
api_key=api_key,
model_kwargs={
"azure_deployment_name": deployment_name,
"azure_base_url": base_url,
},
)
There are two parameters that you pass as model_kwargs
:
azure_deployment_name
- the name of your Azure deployment.azure_base_url
- the URL of the Azure OpenAI endpoint.
Using Cohere Generative Models
You can use generative models from Cohere, like Command, with the PromptNode
by simply specifying the model name and providing your Cohere token:
from haystack.nodes import PromptNode
pn = PromptNode(model_name_or_path="command", api_key=your_cohere_api_key)
Haystack supports Cohere's command
, command-light
, base
and base-light
models.
Using Anthropic Generative Models
Using any of generative models by Anthropic is easy with PromptNode
as well. It requires the model name, the maximum length of the output text, and your Anthropic API key. Optionally, you can add Anthropic's relevant keyword arguments as model_kwargs
:
from haystack.nodes import PromptNode
pn = PromptNode(model_name_or_path="claude-2", api_key=your_anthropic_api_key, max_length=200, model_kwargs={"stream":True})
Using Hugging Face Models
You can specify parameters for Hugging Face models using model_kwargs
. Check out all the available parameters in Hugging Face documentation.
Here's an example of how to set temperature
of the model:
from transformers import GenerationConfig
# Using a dictionary
node = PromptNode(model_kwargs={"generation_kwargs": {"do_sample": True, "temperature": 0.6}})
# Using a GenerationConfig object from HuggingFace
node = PromptNode(model_kwargs={"generation_kwargs": GenerationConfig(do_sample=True, top_p=0.9, temperature=0.6)})
Using Hugging Face Inference API
To see the models that can be used with Hugging Face Inference API, use this command:
curl -s https://api-inference.huggingface.co/framework/text-generation-inference
To use the selected model, simply define a PromptNode with Hugging Face token as an api_key
and selected model in model_name_or_path
.
Using Local Models
To use a local model, initialize your PromptNode using model_kwargs
, where you need to pass any additional parameters (such as task_name
, tokenizer
, etc.) required by specific models.
Here’s an example of how you would initialize your local model with a path:
from haystack.nodes import PromptNode
prompt_node = PromptNode(model_name_or_path=local_path, model_kwargs={'task_name':'text2text-generation'})
Additionally, for local loading, you don't necessarily need to use a cache path. You can load the model using Hugging Face transformers classes/functions and pass the model directly in model_kwargs
.
Using the Latest Hugging Face Hub Text Generation Models
There is a simple approach to incorporate newer LLMs with a custom setup into Haystack.
To do that, initialize your PromptNode using the model_kwargs
value, where you need to pass any additional parameters (such as task_name
, tokenizer
, etc.) required by specific models.
This is an example of how you would initialize an MPT-7B-Instruct model, considering the requirements in its model card:
from haystack.nodes import PromptNode
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-instruct',
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
prompt_node = PromptNode("mosaicml/mpt-7b-instruct", model_kwargs={"model":model, "tokenizer": tokenizer})
Using Open Source LLMs Hosted with AWS SageMaker
If you need to deploy an open source LLM without hosting it yourself or want to safeguard sensitive data, you might consider hosting it with AWS SageMaker.
Using AWS CLI
Consider using AWS CLI as a more straightforward tool to manage your AWS services. With AWS CLI, you can quickly configure your boto3 credentials. This way, you won't need to provide detailed authentication parameters when initializing PromptNode in Haystack.
To begin, you need to deploy your chosen Hugging Face text generation model to SageMaker. You can do this quickly and easily with the SageMaker Studio JumpStart, where you select a model and click on Deploy. For more information, refer to the AWS documentation.
To initialize PromptNode, provide the inference endpoint name and your aws_profile_name
and aws_region_name
. Other authentication parameters are optional if you have already configured them with AWS CLI.
Here’s an example of how to initialize PromptNode:
from haystack.nodes import PromptNode
pn = PromptNode(model_name_or_path="sagemaker-model-endpoint-name", model_kwargs={"aws_profile_name": "my_aws_profile_name","aws_region_name": "your-aws-region"})
Keep in mind that streaming is not yet supported by SageMaker endpoints.
List of tested models
Here are the SageMaker-hosted models that we tested with Haystack:
- Falcon models
- MPT
- Dolly V2
- Flan-U2
- Flan-T5
- RedPajama
- Open Llama
- GPT-J-6B
- GPT NEO
- BloomZ
Using Models on Amazon Bedrock
Amazon Bedrock is a fully managed service that makes high-performing foundation models from leading AI startups (AI21 Labs, Anthropic, Cohere, Meta, Stability.ai) and Amazon available for your use through a unified API. You can choose from a wide range of foundation models to find the one that is best suited for your use case.
To initialize PromptNode, provide the model name, as well as aws_access_key_id
, aws_secret_access_key
and aws_region_name
as model_kwargs. Other parameters are optional.
Here’s an example of how to initialize PromptNode with Amazon Bedrock models:
from haystack.nodes import PromptNode
prompt_node = PromptNode(model_name_or_path="anthropic.claude-v2",
model_kwargs={
"aws_access_key_id":aws_access_key_id,
"aws_secret_access_key":aws_secret_access_key,
"aws_region_name":aws_region_name})
Using Different Models in One Pipeline
You can also specify different LLMs for each PromptNode in your pipeline. This way, you create multiple PromptNode instances that use a single PromptNode, which saves computational resources.
from haystack.nodes. import PromptTemplate, PromptNode, PromptModel
from haystack.pipelines import Pipeline
# This is to set up the OpenAI model:
from getpass import getpass
api_key_prompt = "Enter OpenAI API key:"
api_key = getpass(api_key_prompt)
# Specify the model you want to use:
prompt_open_ai = PromptModel(model_name_or_path="gpt-3.5-turbo-instruct", api_key=api_key)
# This sets up the default model:
prompt_model = PromptModel()
# Now let make one PromptNode use the default model and the other one the OpenAI model:
node_default_model = PromptNode(prompt_model, default_prompt_template="deepset/question-generation", output_variable="questions")
node_openai = PromptNode(prompt_open_ai, default_prompt_template="deepset/question-answering")
pipeline = Pipeline()
pipeline.add_node(component=node_default_model, name="prompt_node1", inputs=["Query"])
pipe.add_node(component=node_openai, name="prompt_node_2", inputs=["prompt_node1"])
output = pipe.run(query="not relevant", documents=[Document("Berlin is the capital of Germany")])
output["results"]
Updated 11 months ago