Enables text generation using LLMs.
Module azure
AzureOpenAIGenerator
A Generator component that uses OpenAI's large language models (LLMs) on Azure to generate text.
It supports gpt-4 and gpt-3.5-turbo family of models.
Users can pass any text generation parameters valid for the openai.ChatCompletion.create
method
directly to this component via the **generation_kwargs
parameter in init or the **generation_kwargs
parameter in run
method.
For more details on OpenAI models deployed on Azure, refer to the Microsoft documentation.
Usage example:
from haystack.components.generators import AzureOpenAIGenerator
from haystack.utils import Secret
client = AzureOpenAIGenerator(
azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
api_key=Secret.from_token("<your-api-key>"),
azure_deployment="<this a model name, e.g. gpt-35-turbo>")
response = client.run("What's Natural Language Processing? Be brief.")
print(response)
>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
>> 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
>> 'completion_tokens': 49, 'total_tokens': 65}}]}
AzureOpenAIGenerator.__init__
def __init__(
azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: Optional[str] = "gpt-35-turbo",
api_key: Optional[Secret] = Secret.from_env_var("AZURE_OPENAI_API_KEY",
strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
system_prompt: Optional[str] = None,
timeout: Optional[float] = None,
generation_kwargs: Optional[Dict[str, Any]] = None)
Initialize the Azure OpenAI Generator.
Arguments:
azure_endpoint
: The endpoint of the deployed model, e.g.https://example-resource.azure.openai.com/
api_version
: The version of the API to use. Defaults to 2023-05-15azure_deployment
: The deployment of the model, usually the model name.api_key
: The API key to use for authentication.azure_ad_token
: Azure Active Directory tokenorganization
: The Organization ID, defaults toNone
. See production best practices.streaming_callback
: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.system_prompt
: The prompt to use for the system. If not provided, the system prompt will betimeout
: The timeout to be passed to the underlyingAzureOpenAI
client.generation_kwargs
: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:max_tokens
: The maximum number of tokens the output text can have.temperature
: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.top_p
: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.n
: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.stop
: One or more sequences after which the LLM should stop generating tokens.presence_penalty
: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.frequency_penalty
: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.logit_bias
: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
AzureOpenAIGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
The serialized component as a dictionary.
AzureOpenAIGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
AzureOpenAIGenerator.run
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
prompt
: The string prompt to use for text generation.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
Returns:
A list of strings containing the generated responses and a list of dictionaries containing the metadata for each response.
Module hugging_face_local
HuggingFaceLocalGenerator
Generator based on a Hugging Face model.
This component provides an interface to generate text using a Hugging Face model that runs locally.
Usage example:
from haystack.components.generators import HuggingFaceLocalGenerator
generator = HuggingFaceLocalGenerator(
model="google/flan-t5-large",
task="text2text-generation",
generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})
generator.warm_up()
print(generator.run("Who is the best American actor?"))
# {'replies': ['John Cusack']}
HuggingFaceLocalGenerator.__init__
def __init__(model: str = "google/flan-t5-base",
task: Optional[Literal["text-generation",
"text2text-generation"]] = None,
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
generation_kwargs: Optional[Dict[str, Any]] = None,
huggingface_pipeline_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Creates an instance of a HuggingFaceLocalGenerator.
Arguments:
model
: The name or path of a Hugging Face model for text generation,task
: The task for the Hugging Face pipeline. Possible values are "text-generation" and "text2text-generation". Generally, decoder-only models like GPT support "text-generation", while encoder-decoder models like T5 support "text2text-generation". If the task is also specified in thehuggingface_pipeline_kwargs
, this parameter will be ignored. If not specified, the component will attempt to infer the task from the model name, calling the Hugging Face Hub API.device
: The device on which the model is loaded. IfNone
, the default device is automatically selected. If a device/device map is specified inhuggingface_pipeline_kwargs
, it overrides this parameter.token
: The token to use as HTTP bearer authorization for remote files. If the token is also specified in thehuggingface_pipeline_kwargs
, this parameter will be ignored.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_length
,max_new_tokens
,temperature
,top_k
,top_p
,... See Hugging Face's documentation for more information:- customize-text-generation
- transformers.GenerationConfig
huggingface_pipeline_kwargs
: Dictionary containing keyword arguments used to initialize the Hugging Face pipeline for text generation. These keyword arguments provide fine-grained control over the Hugging Face pipeline. In case of duplication, these kwargs overridemodel
,task
,device
, andtoken
init parameters. See Hugging Face's documentation for more information on the available kwargs. In this dictionary, you can also includemodel_kwargs
to specify the kwargs for model initialization: transformers.PreTrainedModel.from_pretrainedstop_words
: A list of stop words. If any one of the stop words is generated, the generation is stopped. If you provide this parameter, you should not specify thestopping_criteria
ingeneration_kwargs
. For some chat models, the output includes both the new text and the original prompt. In these cases, it's important to make sure your prompt has no stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceLocalGenerator.warm_up
def warm_up()
Initializes the component.
HuggingFaceLocalGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceLocalGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceLocalGenerator"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
HuggingFaceLocalGenerator.run
@component.output_types(replies=List[str])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Run the text generation model on the given prompt.
Arguments:
prompt
: A string representing the prompt.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A dictionary containing the generated replies.
- replies: A list of strings representing the generated replies.
Module hugging_face_tgi
HuggingFaceTGIGenerator
Enables text generation using HuggingFace Hub hosted non-chat LLMs.
This component is designed to seamlessly inference models deployed on the Text Generation Inference (TGI) backend. You can use this component for LLMs hosted on Hugging Face inference endpoints, the rate-limited Inference API tier.
Key Features and Compatibility:
-
Primary Compatibility: designed to work seamlessly with any non-based model deployed using the TGI framework. For more information on TGI, visit text-generation-inference
-
Hugging Face Inference Endpoints: Supports inference of TGI chat LLMs deployed on Hugging Face inference endpoints. For more details, refer to inference-endpoints
-
Inference API Support: supports inference of TGI LLMs hosted on the rate-limited Inference API tier. Learn more about the Inference API at inference-api. Discover available chat models using the following command:
wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter. -
Custom TGI Endpoints: supports inference of TGI chat LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework. For more details, refer to inference-endpoints
Input and Output Format:
- String Format: This component uses the str format for structuring both input and output, ensuring coherent and contextually relevant responses in text generation scenarios.
from haystack.components.generators import HuggingFaceTGIGenerator
from haystack.utils import Secret
client = HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?", generation_kwargs={"max_new_tokens": 120})
print(response)
Or for LLMs hosted on paid https://huggingface.co/inference-endpoints endpoint, and/or your own custom TGI endpoint. In these two cases, you'll need to provide the URL of the endpoint as well as a valid token:
from haystack.components.generators import HuggingFaceTGIGenerator
client = HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1",
url="<your-tgi-endpoint-url>",
token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run("What's Natural Language Processing?")
print(response)
HuggingFaceTGIGenerator.__init__
def __init__(model: str = "mistralai/Mistral-7B-v0.1",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
generation_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Initialize the HuggingFaceTGIGenerator instance.
Arguments:
model
: A string representing the model id on HF Hub. Default is "mistralai/Mistral-7B-v0.1".url
: An optional string representing the URL of the TGI endpoint. If the url is not provided, check if the model is deployed on the free tier of the HF inference API.token
: The HuggingFace token to use as HTTP bearer authorization You can find your HF token in your account settingsgeneration_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_new_tokens
,temperature
,top_k
,top_p
,... See Hugging Face's documentation for more information at: [TextGenerationParameters](https://huggingface.co/docs/huggingface_hub/v0.18.0.rc0/en/package_reference/inference_client#huggingface_hub.inference._text_generation.TextGenerationParametersstop_words
: An optional list of strings representing the stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceTGIGenerator.warm_up
def warm_up() -> None
Initializes the component.
HuggingFaceTGIGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
A dictionary containing the serialized component.
HuggingFaceTGIGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTGIGenerator"
Deserialize this component from a dictionary.
HuggingFaceTGIGenerator.run
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference for the given prompt and generation parameters.
Arguments:
prompt
: A string representing the prompt.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A dictionary containing the generated replies and metadata. Both are lists of length n.
- replies: A list of strings representing the generated replies.
Module hugging_face_api
HuggingFaceAPIGenerator
A Generator component that uses Hugging Face APIs to generate text.
This component can be used to generate text using different Hugging Face APIs:
- [Free Serverless Inference API]((https://huggingface.co/inference-api)
- Paid Inference Endpoints
- Self-hosted Text Generation Inference
Example usage with the free Serverless Inference API:
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
Example usage with paid Inference Endpoints:
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
api_params={"url": "<your-inference-endpoint-url>"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
Example usage with self-hosted Text Generation Inference:
```python
from haystack.components.generators import HuggingFaceAPIGenerator
generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
api_params={"url": "http://localhost:8080"})
result = generator.run(prompt="What's Natural Language Processing?")
print(result)
HuggingFaceAPIGenerator.__init__
def __init__(api_type: Union[HFGenerationAPIType, str],
api_params: Dict[str, str],
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
generation_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Initialize the HuggingFaceAPIGenerator instance.
Arguments:
api_type
: The type of Hugging Face API to use.api_params
: A dictionary containing the following keys:model
: model ID on the Hugging Face Hub. Required whenapi_type
isSERVERLESS_INFERENCE_API
.url
: URL of the inference endpoint. Required whenapi_type
isINFERENCE_ENDPOINTS
orTEXT_GENERATION_INFERENCE
.token
: The HuggingFace token to use as HTTP bearer authorization. You can find your HF token in your account settings.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_new_tokens
,temperature
,top_k
,top_p
,... See Hugging Face's documentation for more information.stop_words
: An optional list of strings representing the stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceAPIGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
A dictionary containing the serialized component.
HuggingFaceAPIGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPIGenerator"
Deserialize this component from a dictionary.
HuggingFaceAPIGenerator.run
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference for the given prompt and generation parameters.
Arguments:
prompt
: A string representing the prompt.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A dictionary containing the generated replies and metadata. Both are lists of length n.
- replies: A list of strings representing the generated replies.
Module openai
OpenAIGenerator
Text generation component using OpenAI's large language models (LLMs).
Enables text generation using OpenAI's large language models (LLMs). It supports gpt-4 and gpt-3.5-turbo family of models.
Users can pass any text generation parameters valid for the openai.ChatCompletion.create
method
directly to this component via the **generation_kwargs
parameter in init or the **generation_kwargs
parameter in run
method.
For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
Key Features and Compatibility:
- Primary Compatibility: Designed to work seamlessly with gpt-4, gpt-3.5-turbo family of models.
- Streaming Support: Supports streaming responses from the OpenAI API.
- Customizability: Supports all parameters supported by the OpenAI API.
Input and Output Format:
- String Format: This component uses the strings for both input and output.
from haystack.components.generators import OpenAIGenerator
client = OpenAIGenerator()
response = client.run("What's Natural Language Processing? Be brief.")
print(response)
>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
>> 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
>> 'completion_tokens': 49, 'total_tokens': 65}}]}
OpenAIGenerator.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "gpt-3.5-turbo",
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
system_prompt: Optional[str] = None,
generation_kwargs: Optional[Dict[str, Any]] = None,
timeout: Optional[float] = None,
max_retries: Optional[int] = None)
Creates an instance of OpenAIGenerator. Unless specified otherwise in the model
, this is for OpenAI's GPT-3.5 model.
By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.streaming_callback
: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.api_base_url
: An optional base URL.organization
: The Organization ID, defaults toNone
. See production best practices.system_prompt
: The system prompt to use for text generation. If not provided, the system prompt is omitted, and the default system prompt of the model is used.generation_kwargs
: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:max_tokens
: The maximum number of tokens the output text can have.temperature
: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.top_p
: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens comprising the top 10% probability mass are considered.n
: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.stop
: One or more sequences after which the LLM should stop generating tokens.presence_penalty
: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.frequency_penalty
: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.logit_bias
: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.timeout
: Timeout for OpenAI Client calls, if not set it is inferred from theOPENAI_TIMEOUT
environment variable or set to 30.max_retries
: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred from theOPENAI_MAX_RETRIES
environment variable or set to 5.
OpenAIGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
The serialized component as a dictionary.
OpenAIGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
OpenAIGenerator.run
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(prompt: str, generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
prompt
: The string prompt to use for text generation.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
Returns:
A list of strings containing the generated responses and a list of dictionaries containing the metadata for each response.
Module chat/azure
AzureOpenAIChatGenerator
A Chat Generator component that uses the Azure OpenAI API to generate text.
Enables text generation using OpenAI's large language models (LLMs) on Azure. It supports gpt-4
and gpt-3.5-turbo
family of models accessed through the chat completions API endpoint.
Users can pass any text generation parameters valid for the openai.ChatCompletion.create
method
directly to this component via the generation_kwargs
parameter in __init__
or the generation_kwargs
parameter in run
method.
For more details on OpenAI models deployed on Azure, refer to the Microsoft documentation.
Key Features and Compatibility:
- Primary Compatibility: Designed to work seamlessly with the OpenAI API Chat Completion endpoint.
- Streaming Support: Supports streaming responses from the OpenAI API Chat Completion endpoint.
- Customizability: Supports all parameters supported by the OpenAI API Chat Completion endpoint.
Input and Output Format:
- ChatMessage Format: This component uses the ChatMessage format for structuring both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios.
- Details on the ChatMessage format can be found here.
Usage example:
from haystack.components.generators.chat import AzureOpenAIGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
messages = [ChatMessage.from_user("What's Natural Language Processing?")]
client = AzureOpenAIGenerator(
azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
api_key=Secret.from_token("<your-api-key>"),
azure_deployment="<this a model name, e.g. gpt-35-turbo>")
response = client.run(messages)
print(response)
{'replies':
[ChatMessage(content='Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
enabling computers to understand, interpret, and generate human language in a way that is meaningful and useful.',
role=<ChatRole.ASSISTANT: 'assistant'>, name=None,
meta={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop',
'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
}
AzureOpenAIChatGenerator.__init__
def __init__(
azure_endpoint: Optional[str] = None,
api_version: Optional[str] = "2023-05-15",
azure_deployment: Optional[str] = "gpt-35-turbo",
api_key: Optional[Secret] = Secret.from_env_var("AZURE_OPENAI_API_KEY",
strict=False),
azure_ad_token: Optional[Secret] = Secret.from_env_var(
"AZURE_OPENAI_AD_TOKEN", strict=False),
organization: Optional[str] = None,
streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
timeout: Optional[float] = None,
generation_kwargs: Optional[Dict[str, Any]] = None)
Initialize the Azure OpenAI Chat Generator component.
Arguments:
azure_endpoint
: The endpoint of the deployed model, e.g."https://example-resource.azure.openai.com/"
api_version
: The version of the API to use. Defaults to 2023-05-15azure_deployment
: The deployment of the model, usually the model name.api_key
: The API key to use for authentication.azure_ad_token
: Azure Active Directory tokenorganization
: The Organization ID, defaults toNone
. See production best practices.streaming_callback
: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.generation_kwargs
: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:max_tokens
: The maximum number of tokens the output text can have.temperature
: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.top_p
: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.n
: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.stop
: One or more sequences after which the LLM should stop generating tokens.presence_penalty
: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.frequency_penalty
: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.logit_bias
: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
AzureOpenAIChatGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
The serialized component as a dictionary.
AzureOpenAIChatGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AzureOpenAIChatGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
AzureOpenAIChatGenerator.run
@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
Returns:
A list containing the generated responses as ChatMessage instances.
Module chat/hugging_face_local
HuggingFaceLocalChatGenerator
A Chat Generator component that uses models available on Hugging Face Hub to generate chat responses locally.
The HuggingFaceLocalChatGenerator
class is a component designed for generating chat responses using models from
Hugging Face's model hub. It is tailored for local runtime text generation tasks and provides a convenient interface
for working with chat-based models, such as HuggingFaceH4/zephyr-7b-beta
or meta-llama/Llama-2-7b-chat-hf
etc.
Usage example:
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
from haystack.dataclasses import ChatMessage
generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta")
generator.warm_up()
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
print(generator.run(messages))
{'replies':
[ChatMessage(content=' Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
with the interaction between computers and human language. It enables computers to understand, interpret, and
generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
process and derive meaning from human language, improving communication between humans and machines.',
role=<ChatRole.ASSISTANT: 'assistant'>,
name=None,
meta={'finish_reason': 'stop', 'index': 0, 'model':
'mistralai/Mistral-7B-Instruct-v0.2',
'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
]
}
HuggingFaceLocalChatGenerator.__init__
def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
task: Optional[Literal["text-generation",
"text2text-generation"]] = None,
device: Optional[ComponentDevice] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
chat_template: Optional[str] = None,
generation_kwargs: Optional[Dict[str, Any]] = None,
huggingface_pipeline_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Initializes the HuggingFaceLocalChatGenerator component.
Arguments:
model
: The name or path of a Hugging Face model for text generation, for example,mistralai/Mistral-7B-Instruct-v0.2
,TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ
, etc. The important aspect of the model is that it should be a chat model and that it supports ChatML messaging format. If the model is also specified in thehuggingface_pipeline_kwargs
, this parameter will be ignored.task
: The task for the Hugging Face pipeline. Possible values are "text-generation" and "text2text-generation". Generally, decoder-only models like GPT support "text-generation", while encoder-decoder models like T5 support "text2text-generation". If the task is also specified in thehuggingface_pipeline_kwargs
, this parameter will be ignored. If not specified, the component will attempt to infer the task from the model name, calling the Hugging Face Hub API.device
: The device on which the model is loaded. IfNone
, the default device is automatically selected. If a device/device map is specified inhuggingface_pipeline_kwargs
, it overrides this parameter.token
: The token to use as HTTP bearer authorization for remote files. If the token is also specified in thehuggingface_pipeline_kwargs
, this parameter will be ignored.chat_template
: This optional parameter allows you to specify a Jinja template for formatting chat messages. While high-quality and well-supported chat models typically include their own chat templates accessible through their tokenizer, there are models that do not offer this feature. For such scenarios, or if you wish to use a custom template instead of the model's default, you can use this parameter to set your preferred chat template.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_length
,max_new_tokens
,temperature
,top_k
,top_p
, etc. See Hugging Face's documentation for more information:- The only generation_kwargs we set by default is max_new_tokens, which is set to 512 tokens.
huggingface_pipeline_kwargs
: Dictionary containing keyword arguments used to initialize the Hugging Face pipeline for text generation. These keyword arguments provide fine-grained control over the Hugging Face pipeline. In case of duplication, these kwargs overridemodel
,task
,device
, andtoken
init parameters. See Hugging Face's documentation for more information on the available kwargs. In this dictionary, you can also includemodel_kwargs
to specify the kwargs for model initializationstop_words
: A list of stop words. If any one of the stop words is generated, the generation is stopped. If you provide this parameter, you should not specify thestopping_criteria
ingeneration_kwargs
. For some chat models, the output includes both the new text and the original prompt. In these cases, it's important to make sure your prompt has no stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceLocalChatGenerator.warm_up
def warm_up()
Initializes the component.
HuggingFaceLocalChatGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serializes the component to a dictionary.
Returns:
Dictionary with serialized data.
HuggingFaceLocalChatGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceLocalChatGenerator"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary to deserialize from.
Returns:
The deserialized component.
HuggingFaceLocalChatGenerator.run
@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke text generation inference based on the provided messages and generation parameters.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A list containing the generated responses as ChatMessage instances.
HuggingFaceLocalChatGenerator.create_message
def create_message(text: str, index: int,
tokenizer: Union["PreTrainedTokenizer",
"PreTrainedTokenizerFast"], prompt: str,
generation_kwargs: Dict[str, Any]) -> ChatMessage
Create a ChatMessage instance from the provided text, populated with metadata.
Arguments:
text
: The generated text.index
: The index of the generated text.tokenizer
: The tokenizer used for generation.prompt
: The prompt used for generation.generation_kwargs
: The generation parameters.
Returns:
A ChatMessage instance.
Module chat/hugging_face_tgi
HuggingFaceTGIChatGenerator
A Chat-based text generation component using Hugging Face's Text Generation Inference (TGI) framework.
Enables text generation using HuggingFace Hub hosted chat-based LLMs. This component is designed to seamlessly inference chat-based models deployed on the Text Generation Inference (TGI) backend.
You can use this component for chat LLMs hosted on Hugging Face inference endpoints, the rate-limited Inference API tier.
Key Features and Compatibility:
-
Primary Compatibility: designed to work seamlessly with any chat-based model deployed using the TGI framework. For more information on TGI, visit text-generation-inference
-
Hugging Face Inference Endpoints: Supports inference of TGI chat LLMs deployed on Hugging Face inference endpoints. For more details, refer to inference-endpoints
-
Inference API Support: supports inference of TGI chat LLMs hosted on the rate-limited Inference API tier. Learn more about the Inference API at inference-api. Discover available chat models using the following command:
wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid Hugging Face API token as the token parameter. -
Custom TGI Endpoints: supports inference of TGI chat LLMs deployed on custom TGI endpoints. Anyone can deploy their own TGI endpoint using the TGI framework. For more details, refer to inference-endpoints
Input and Output Format:
- ChatMessage Format: This component uses the ChatMessage format to structure both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios. Details on the ChatMessage format can be found here.
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="HuggingFaceH4/zephyr-7b-beta", token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
For chat LLMs hosted on paid https://huggingface.co/inference-endpoints endpoint and/or your own custom TGI endpoint, you'll need to provide the URL of the endpoint as well as a valid token:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
client = HuggingFaceTGIChatGenerator(model="HuggingFaceH4/zephyr-7b-beta",
url="<your-tgi-endpoint-url>",
token=Secret.from_token("<your-api-key>"))
client.warm_up()
response = client.run(messages, generation_kwargs={"max_new_tokens": 120})
print(response)
HuggingFaceTGIChatGenerator.__init__
def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
url: Optional[str] = None,
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
chat_template: Optional[str] = None,
generation_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Initialize the HuggingFaceTGIChatGenerator instance.
Arguments:
model
: A string representing the model path or URL. Default is "HuggingFaceH4/zephyr-7b-beta".url
: An optional string representing the URL of the TGI endpoint.chat_template
: This optional parameter allows you to specify a Jinja template for formatting chat messages. While high-quality and well-supported chat models typically include their own chat templates accessible through their tokenizer, there are models that do not offer this feature. For such scenarios, or if you wish to use a custom template instead of the model's default, you can use this parameter to set your preferred chat template.token
: The Hugging Face token for HTTP bearer authorization. You can find your HF token at https://huggingface.co/settings/tokens.generation_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_new_tokens
,temperature
,top_k
,top_p
,... See Hugging Face's documentation for more information.stop_words
: An optional list of strings representing the stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceTGIChatGenerator.warm_up
def warm_up() -> None
Warm up the tokenizer by loading it from the model.
If the url is not provided, check if the model is deployed on the free tier of the HF inference API. Load the tokenizer
HuggingFaceTGIChatGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
A dictionary containing the serialized component.
HuggingFaceTGIChatGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceTGIChatGenerator"
Deserialize this component from a dictionary.
HuggingFaceTGIChatGenerator.run
@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A list containing the generated responses as ChatMessage instances.
Module chat/hugging_face_api
HuggingFaceAPIChatGenerator
A Chat Generator component that uses Hugging Face APIs to generate text.
This component can be used to generate text using different Hugging Face APIs with the ChatMessage format:
Input and Output Format:
- ChatMessage Format: This component uses the ChatMessage format to structure both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios. Details on the ChatMessage format can be found here.
Example usage with the free Serverless Inference API:
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
# the api_type can be expressed using the HFGenerationAPIType enum or as a string
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
api_type = "serverless_inference_api" # this is equivalent to the above
generator = HuggingFaceAPIChatGenerator(api_type=api_type,
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(messages)
print(result)
Example usage with paid Inference Endpoints:
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
api_params={"url": "<your-inference-endpoint-url>"},
token=Secret.from_token("<your-api-key>"))
result = generator.run(messages)
print(result)
Example usage with self-hosted Text Generation Inference:
```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
ChatMessage.from_user("What's Natural Language Processing?")]
generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
api_params={"url": "http://localhost:8080"})
result = generator.run(messages)
print(result)
HuggingFaceAPIChatGenerator.__init__
def __init__(api_type: Union[HFGenerationAPIType, str],
api_params: Dict[str, str],
token: Optional[Secret] = Secret.from_env_var("HF_API_TOKEN",
strict=False),
generation_kwargs: Optional[Dict[str, Any]] = None,
stop_words: Optional[List[str]] = None,
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None)
Initialize the HuggingFaceAPIChatGenerator instance.
Arguments:
api_type
: The type of Hugging Face API to use.api_params
: A dictionary containing the following keys:model
: model ID on the Hugging Face Hub. Required whenapi_type
isSERVERLESS_INFERENCE_API
.url
: URL of the inference endpoint. Required whenapi_type
isINFERENCE_ENDPOINTS
orTEXT_GENERATION_INFERENCE
.token
: The HuggingFace token to use as HTTP bearer authorization You can find your HF token in your account settingsgeneration_kwargs
: A dictionary containing keyword arguments to customize text generation. Some examples:max_tokens
,temperature
,top_p
... See Hugging Face's documentation for more information at: chat_completion.stop_words
: An optional list of strings representing the stop words.streaming_callback
: An optional callable for handling streaming responses.
HuggingFaceAPIChatGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
A dictionary containing the serialized component.
HuggingFaceAPIChatGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "HuggingFaceAPIChatGenerator"
Deserialize this component from a dictionary.
HuggingFaceAPIChatGenerator.run
@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation.
Returns:
A dictionary with the following keys:
replies
: A list containing the generated responses as ChatMessage instances.
Module chat/openai
OpenAIChatGenerator
A Chat Generator component that uses the OpenAI API to generate text.
Enables text generation using OpenAI's large language models (LLMs). It supports gpt-4
and gpt-3.5-turbo
family of models accessed through the chat completions API endpoint.
Users can pass any text generation parameters valid for the openai.ChatCompletion.create
method
directly to this component via the generation_kwargs
parameter in __init__
or the generation_kwargs
parameter in run
method.
For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
messages = [ChatMessage.from_user("What's Natural Language Processing?")]
client = OpenAIChatGenerator()
response = client.run(messages)
print(response)
Output:
{'replies':
[ChatMessage(content='Natural Language Processing (NLP) is a branch of artificial intelligence
that focuses on enabling computers to understand, interpret, and generate human language in
a way that is meaningful and useful.',
role=<ChatRole.ASSISTANT: 'assistant'>, name=None,
meta={'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'stop',
'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
]
}
Key Features and Compatibility:
- Primary Compatibility: designed to work seamlessly with the OpenAI API Chat Completion endpoint and
gpt-4
andgpt-3.5-turbo
family of models. - Streaming Support: supports streaming responses from the OpenAI API Chat Completion endpoint.
- Customizability: supports all parameters supported by the OpenAI API Chat Completion endpoint.
Input and Output Format:
- ChatMessage Format: this component uses the ChatMessage format for structuring both input and output, ensuring coherent and contextually relevant responses in chat-based text generation scenarios. Details on the ChatMessage format can be found at here.
OpenAIChatGenerator.__init__
def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
model: str = "gpt-3.5-turbo",
streaming_callback: Optional[Callable[[StreamingChunk],
None]] = None,
api_base_url: Optional[str] = None,
organization: Optional[str] = None,
generation_kwargs: Optional[Dict[str, Any]] = None,
timeout: Optional[float] = None,
max_retries: Optional[int] = None)
Initializes the OpenAIChatGenerator component.
Creates an instance of OpenAIChatGenerator. Unless specified otherwise in the model
, this is for OpenAI's
GPT-3.5 model.
By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.
Arguments:
api_key
: The OpenAI API key.model
: The name of the model to use.streaming_callback
: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.api_base_url
: An optional base URL.organization
: The Organization ID, defaults toNone
. See production best practices.generation_kwargs
: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:max_tokens
: The maximum number of tokens the output text can have.temperature
: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.top_p
: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.n
: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.stop
: One or more sequences after which the LLM should stop generating tokens.presence_penalty
: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.frequency_penalty
: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.logit_bias
: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.timeout
: Timeout for OpenAI Client calls, if not set it is inferred from theOPENAI_TIMEOUT
environment variable or set to 30.max_retries
: Maximum retries to stablish contact with OpenAI if it returns an internal error, if not set it is inferred from theOPENAI_MAX_RETRIES
environment variable or set to 5.
OpenAIChatGenerator.to_dict
def to_dict() -> Dict[str, Any]
Serialize this component to a dictionary.
Returns:
The serialized component as a dictionary.
OpenAIChatGenerator.from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIChatGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
OpenAIChatGenerator.run
@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
generation_kwargs: Optional[Dict[str, Any]] = None)
Invoke the text generation inference based on the provided messages and generation parameters.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.
Returns:
A list containing the generated responses as ChatMessage instances.