Version: 2.20

Generators

Module azure

AzureOpenAIGenerator

Generates text using OpenAI's large language models (LLMs).

It works with the gpt-4 - type models and supports streaming responses from OpenAI API.

You can customize how the text is generated by passing parameters to the OpenAI API. Use the **generation_kwargs argument when you initialize the component or when you run it. Any parameter that works with openai.ChatCompletion.create will work here too.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators import AzureOpenAIGenerator
from haystack.utils import Secret
client = AzureOpenAIGenerator(
    azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
    api_key=Secret.from_token("<your-api-key>"),
    azure_deployment="<this a model name, e.g.  gpt-4o-mini>")
response = client.run("What's Natural Language Processing? Be brief.")
print(response)

>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
>> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
>> 'completion_tokens': 49, 'total_tokens': 65}}]}

AzureOpenAIGenerator.init

python

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: Optional[str] = "gpt-4o-mini",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             streaming_callback: Optional[StreamingCallbackT] = None,
             system_prompt: Optional[str] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             http_client_kwargs: Optional[dict[str, Any]] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             default_headers: Optional[dict[str, str]] = None,
             *,
             azure_ad_token_provider: Optional[AzureADTokenProvider] = None)

Initialize the Azure OpenAI Generator.

Arguments:

azure_endpoint: The endpoint of the deployed model, for example https://example-resource.azure.openai.com/.
api_version: The version of the API to use. Defaults to 2023-05-15.
azure_deployment: The deployment of the model, usually the model name.
api_key: The API key to use for authentication.
azure_ad_token: Azure Active Directory token.
organization: Your organization ID, defaults to None. For help, see Setting up your organization.
streaming_callback: A callback function called when a new token is received from the stream. It accepts StreamingChunk as an argument.
system_prompt: The system prompt to use for text generation. If not provided, the Generator omits the system prompt and uses the default system prompt.
timeout: Timeout for AzureOpenAI client. If not set, it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
max_retries: Maximum retries to establish contact with AzureOpenAI if it returns an internal error. If not set, it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.
generation_kwargs: Other parameters to use for the model, sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
max_completion_tokens: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
temperature: The sampling temperature to use. Higher values mean the model takes more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
n: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, the LLM will generate two completions per prompt, resulting in 6 completions total.
stop: One or more sequences after which the LLM should stop generating tokens.
presence_penalty: The penalty applied if a token is already present. Higher values make the model less likely to repeat the token.
frequency_penalty: Penalty applied if a token has already been generated. Higher values make the model less likely to repeat the token.
logit_bias: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
default_headers: Default headers to use for the AzureOpenAI client.
azure_ad_token_provider: A function that returns an Azure Active Directory token, will be invoked on every request.

AzureOpenAIGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

AzureOpenAIGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

AzureOpenAIGenerator.run

python

@component.output_types(replies=list[str], meta=list[dict[str, Any]])
def run(prompt: str,
        system_prompt: Optional[str] = None,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None)

Invoke the text generation inference based on the provided messages and generation parameters.

Arguments:

prompt: The string prompt to use for text generation.
system_prompt: The system prompt to use for text generation. If this run time system prompt is omitted, the system prompt, if defined at initialisation time, is used.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.

Returns:

A list of strings containing the generated responses and a list of dictionaries containing the metadata for each response.

Module hugging_face_local

HuggingFaceLocalGenerator

Generates text using models from Hugging Face that run locally.

LLMs running locally may need powerful hardware.

Usage example

python

from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(
    model="google/flan-t5-large",
    task="text2text-generation",
    generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})

generator.warm_up()

print(generator.run("Who is the best American actor?"))
# {'replies': ['John Cusack']}

HuggingFaceLocalGenerator.init

python

def __init__(model: str = "google/flan-t5-base",
             task: Optional[Literal["text-generation",
                                    "text2text-generation"]] = None,
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             generation_kwargs: Optional[dict[str, Any]] = None,
             huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
             stop_words: Optional[list[str]] = None,
             streaming_callback: Optional[StreamingCallbackT] = None)

Creates an instance of a HuggingFaceLocalGenerator.

Arguments:

model: The Hugging Face text generation model name or path.
task: The task for the Hugging Face pipeline. Possible options:
text-generation: Supported by decoder models, like GPT.
text2text-generation: Supported by encoder-decoder models, like T5. If the task is specified in huggingface_pipeline_kwargs, this parameter is ignored. If not specified, the component calls the Hugging Face API to infer the task from the model name.
device: The device for loading the model. If None, automatically selects the default device. If a device or device map is specified in huggingface_pipeline_kwargs, it overrides this parameter.
token: The token to use as HTTP bearer authorization for remote files. If the token is specified in huggingface_pipeline_kwargs, this parameter is ignored.
generation_kwargs: A dictionary with keyword arguments to customize text generation. Some examples: max_length, max_new_tokens, temperature, top_k, top_p. See Hugging Face's documentation for more information:
customize-text-generation
transformers.GenerationConfig
huggingface_pipeline_kwargs: Dictionary with keyword arguments to initialize the Hugging Face pipeline for text generation. These keyword arguments provide fine-grained control over the Hugging Face pipeline. In case of duplication, these kwargs override model, task, device, and token init parameters. For available kwargs, see Hugging Face documentation. In this dictionary, you can also include model_kwargs to specify the kwargs for model initialization: transformers.PreTrainedModel.from_pretrained
stop_words: If the model generates a stop word, the generation stops. If you provide this parameter, don't specify the stopping_criteria in generation_kwargs. For some chat models, the output includes both the new text and the original prompt. In these cases, make sure your prompt has no stop words.
streaming_callback: An optional callable for handling streaming responses.

HuggingFaceLocalGenerator.warm_up

python

def warm_up()

Initializes the component.

HuggingFaceLocalGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceLocalGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator"

Deserializes the component from a dictionary.

Arguments:

data: The dictionary to deserialize from.

Returns:

The deserialized component.

HuggingFaceLocalGenerator.run

python

@component.output_types(replies=list[str])
def run(prompt: str,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None)

Run the text generation model on the given prompt.

Arguments:

prompt: A string representing the prompt.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation.

Returns:

A dictionary containing the generated replies.

replies: A list of strings representing the generated replies.

Module hugging_face_api

HuggingFaceAPIGenerator

Generates text using Hugging Face APIs.

Use it with the following Hugging Face APIs:

Note: As of July 2025, the Hugging Face Inference API no longer offers generative models through the text_generation endpoint. Generative models are now only available through providers supporting the chat_completion endpoint. As a result, this component might no longer work with the Hugging Face Inference API. Use the HuggingFaceAPIChatGenerator component, which supports the chat_completion endpoint.

Usage examples

With Hugging Face Inference Endpoints

With self-hosted text generation inference

With the free serverless inference API

Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the text_generation endpoint. Use the HuggingFaceAPIChatGenerator for generative models through the chat_completion endpoint.

python

from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret

generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
                                    api_params={"url": "<your-inference-endpoint-url>"},
                                    token=Secret.from_token("<your-api-key>"))

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

python

from haystack.components.generators import HuggingFaceAPIGenerator

generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
                                    api_params={"url": "http://localhost:8080"})

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

python

from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret

generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
                                    api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
                                    token=Secret.from_token("<your-api-key>"))

result = generator.run(prompt="What's Natural Language Processing?")
print(result)

HuggingFaceAPIGenerator.init

python

def __init__(api_type: Union[HFGenerationAPIType, str],
             api_params: dict[str, str],
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             generation_kwargs: Optional[dict[str, Any]] = None,
             stop_words: Optional[list[str]] = None,
             streaming_callback: Optional[StreamingCallbackT] = None)

Initialize the HuggingFaceAPIGenerator instance.

Arguments:

api_type: The type of Hugging Face API to use. Available types:
text_generation_inference: See TGI.
inference_endpoints: See Inference Endpoints.
serverless_inference_api: See Serverless Inference API. This might no longer work due to changes in the models offered in the Hugging Face Inference API. Please use the HuggingFaceAPIChatGenerator component instead.
api_params: A dictionary with the following keys:
model: Hugging Face model ID. Required when api_type is SERVERLESS_INFERENCE_API.
url: URL of the inference endpoint. Required when api_type is INFERENCE_ENDPOINTS or TEXT_GENERATION_INFERENCE.
Other parameters specific to the chosen API type, such as timeout, headers, provider etc.
token: The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
generation_kwargs: A dictionary with keyword arguments to customize text generation. Some examples: max_new_tokens, temperature, top_k, top_p. For details, see Hugging Face documentation for more information.
stop_words: An optional list of strings representing the stop words.
streaming_callback: An optional callable for handling streaming responses.

HuggingFaceAPIGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

A dictionary containing the serialized component.

HuggingFaceAPIGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator"

Deserialize this component from a dictionary.

HuggingFaceAPIGenerator.run

python

@component.output_types(replies=list[str], meta=list[dict[str, Any]])
def run(prompt: str,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None)

Invoke the text generation inference for the given prompt and generation parameters.

Arguments:

prompt: A string representing the prompt.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation.

Returns:

A dictionary with the generated replies and metadata. Both are lists of length n.

replies: A list of strings representing the generated replies.

Module openai

OpenAIGenerator

Generates text using OpenAI's large language models (LLMs).

It works with the gpt-4 and o-series models and supports streaming responses from OpenAI API. It uses strings as input and output.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators import OpenAIGenerator
client = OpenAIGenerator()
response = client.run("What's Natural Language Processing? Be brief.")
print(response)

>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
>> the interaction between computers and human language. It involves enabling computers to understand, interpret,
>> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
>> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
>> 'completion_tokens': 49, 'total_tokens': 65}}]}

OpenAIGenerator.init

python

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "gpt-4o-mini",
             streaming_callback: Optional[StreamingCallbackT] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             system_prompt: Optional[str] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Creates an instance of OpenAIGenerator. Unless specified otherwise in model, uses OpenAI's gpt-4o-mini

By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters in the OpenAI client.

Arguments:

api_key: The OpenAI API key to connect to OpenAI.
model: The name of the model to use.
streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
api_base_url: An optional base URL.
organization: The Organization ID, defaults to None.
system_prompt: The system prompt to use for text generation. If not provided, the system prompt is omitted, and the default system prompt of the model is used.
generation_kwargs: Other parameters to use for the model. These parameters are all sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
max_completion_tokens: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
temperature: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens comprising the top 10% probability mass are considered.
n: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.
stop: One or more sequences after which the LLM should stop generating tokens.
presence_penalty: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.
frequency_penalty: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.
logit_bias: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
timeout: Timeout for OpenAI Client calls, if not set it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
max_retries: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

OpenAIGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

OpenAIGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

OpenAIGenerator.run

python

@component.output_types(replies=list[str], meta=list[dict[str, Any]])
def run(prompt: str,
        system_prompt: Optional[str] = None,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None)

Invoke the text generation inference based on the provided messages and generation parameters.

Arguments:

prompt: The string prompt to use for text generation.
system_prompt: The system prompt to use for text generation. If this run time system prompt is omitted, the system prompt, if defined at initialisation time, is used.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method. For more details on the parameters supported by the OpenAI API, refer to the OpenAI documentation.

Returns:

A list of strings containing the generated responses and a list of dictionaries containing the metadata for each response.

Module openai_dalle

DALLEImageGenerator

Generates images using OpenAI's DALL-E model.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators import DALLEImageGenerator
image_generator = DALLEImageGenerator()
response = image_generator.run("Show me a picture of a black cat.")
print(response)

DALLEImageGenerator.init

python

def __init__(model: str = "dall-e-3",
             quality: Literal["standard", "hd"] = "standard",
             size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
                           "1024x1792"] = "1024x1024",
             response_format: Literal["url", "b64_json"] = "url",
             api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Creates an instance of DALLEImageGenerator. Unless specified otherwise in model, uses OpenAI's dall-e-3.

Arguments:

model: The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
quality: The quality of the generated image. Can be "standard" or "hd".
size: The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
response_format: The format of the response. Can be "url" or "b64_json".
api_key: The OpenAI API key to connect to OpenAI.
api_base_url: An optional base URL.
organization: The Organization ID, defaults to None.
timeout: Timeout for OpenAI Client calls. If not set, it is inferred from the OPENAI_TIMEOUT environment variable or set to 30.
max_retries: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred from the OPENAI_MAX_RETRIES environment variable or set to 5.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

DALLEImageGenerator.warm_up

python

def warm_up() -> None

Warm up the OpenAI client.

DALLEImageGenerator.run

python

@component.output_types(images=list[str], revised_prompt=str)
def run(prompt: str,
        size: Optional[Literal["256x256", "512x512", "1024x1024", "1792x1024",
                               "1024x1792"]] = None,
        quality: Optional[Literal["standard", "hd"]] = None,
        response_format: Optional[Optional[Literal["url",
                                                   "b64_json"]]] = None)

Invokes the image generation inference based on the provided prompt and generation parameters.

Arguments:

prompt: The prompt to generate the image.
size: If provided, overrides the size provided during initialization.
quality: If provided, overrides the quality provided during initialization.
response_format: If provided, overrides the response format provided during initialization.

Returns:

A dictionary containing the generated list of images and the revised prompt. Depending on the response_format parameter, the list of images can be URLs or base64 encoded JSON strings. The revised prompt is the prompt that was used to generate the image, if there was any revision to the prompt made by OpenAI.

DALLEImageGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

DALLEImageGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

Module chat/azure

AzureOpenAIChatGenerator

Generates text using OpenAI's models on Azure.

It works with the gpt-4 - type models and supports streaming responses from OpenAI API. It uses ChatMessage format in input and output.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators.chat import AzureOpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = AzureOpenAIChatGenerator(
    azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
    api_key=Secret.from_token("<your-api-key>"),
    azure_deployment="<this a model name, e.g. gpt-4o-mini>")
response = client.run(messages)
print(response)

{'replies':
    [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
    "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
     enabling computers to understand, interpret, and generate human language in a way that is useful.")],
     _name=None,
     _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
     'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
}

AzureOpenAIChatGenerator.init

python

def __init__(azure_endpoint: Optional[str] = None,
             api_version: Optional[str] = "2023-05-15",
             azure_deployment: Optional[str] = "gpt-4o-mini",
             api_key: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_API_KEY", strict=False),
             azure_ad_token: Optional[Secret] = Secret.from_env_var(
                 "AZURE_OPENAI_AD_TOKEN", strict=False),
             organization: Optional[str] = None,
             streaming_callback: Optional[StreamingCallbackT] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             default_headers: Optional[dict[str, str]] = None,
             tools: Optional[ToolsType] = None,
             tools_strict: bool = False,
             *,
             azure_ad_token_provider: Optional[Union[
                 AzureADTokenProvider, AsyncAzureADTokenProvider]] = None,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Initialize the Azure OpenAI Chat Generator component.

Arguments:

azure_endpoint: The endpoint of the deployed model, for example "https://example-resource.azure.openai.com/".
api_version: The version of the API to use. Defaults to 2023-05-15.
azure_deployment: The deployment of the model, usually the model name.
api_key: The API key to use for authentication.
azure_ad_token: Azure Active Directory token.
organization: Your organization ID, defaults to None. For help, see Setting up your organization.
streaming_callback: A callback function called when a new token is received from the stream. It accepts StreamingChunk as an argument.
timeout: Timeout for OpenAI client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable, or 30 seconds.
max_retries: Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. For details, see OpenAI documentation. Some of the supported parameters:
max_completion_tokens: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
temperature: The sampling temperature to use. Higher values mean the model takes more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
top_p: Nucleus sampling is an alternative to sampling with temperature, where the model considers tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
n: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, the LLM will generate two completions per prompt, resulting in 6 completions total.
stop: One or more sequences after which the LLM should stop generating tokens.
presence_penalty: The penalty applied if a token is already present. Higher values make the model less likely to repeat the token.
frequency_penalty: Penalty applied if a token has already been generated. Higher values make the model less likely to repeat the token.
logit_bias: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
response_format: A JSON schema or a Pydantic model that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). For details, see the OpenAI Structured Outputs documentation. Notes:
- This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. Older models only support basic version of structured outputs through {"type": "json_object"}. For detailed information on JSON mode, see the OpenAI Structured Outputs documentation.
- For structured outputs with streaming, the response_format must be a JSON schema and not a Pydantic model.
default_headers: Default headers to use for the AzureOpenAI client.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency.
azure_ad_token_provider: A function that returns an Azure Active Directory token, will be invoked on every request.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

AzureOpenAIChatGenerator.warm_up

python

def warm_up()

Warm up the Azure OpenAI chat generator.

This will warm up the tools registered in the chat generator. This method is idempotent and will only warm up the tools once.

AzureOpenAIChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

AzureOpenAIChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

AzureOpenAIChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None,
        *,
        tools: Optional[ToolsType] = None,
        tools_strict: Optional[bool] = None)

Invokes chat completion based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

AzureOpenAIChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(messages: list[ChatMessage],
                    streaming_callback: Optional[StreamingCallbackT] = None,
                    generation_kwargs: Optional[dict[str, Any]] = None,
                    *,
                    tools: Optional[ToolsType] = None,
                    tools_strict: Optional[bool] = None)

Asynchronously invokes chat completion based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in async code.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream. Must be a coroutine.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

Module chat/azure_responses

AzureOpenAIResponsesChatGenerator

Completes chats using OpenAI's Responses API on Azure.

It works with the gpt-5 and o-series models and supports streaming responses from OpenAI API. It uses ChatMessage format in input and output.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = AzureOpenAIResponsesChatGenerator(
    azure_endpoint="https://example-resource.azure.openai.com/",
    generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
)
response = client.run(messages)
print(response)

AzureOpenAIResponsesChatGenerator.init

python

def __init__(*,
             api_key: Union[Secret, Callable[[], str],
                            Callable[[],
                                     Awaitable[str]]] = Secret.from_env_var(
                                         "AZURE_OPENAI_API_KEY", strict=False),
             azure_endpoint: Optional[str] = None,
             azure_deployment: str = "gpt-5-mini",
             streaming_callback: Optional[StreamingCallbackT] = None,
             organization: Optional[str] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             tools: Optional[ToolsType] = None,
             tools_strict: bool = False,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Initialize the AzureOpenAIResponsesChatGenerator component.

Arguments:

api_key: The API key to use for authentication. Can be:
A Secret object containing the API key.
A Secret object containing the Azure Active Directory token.
A function that returns an Azure Active Directory token.
azure_endpoint: The endpoint of the deployed model, for example "https://example-resource.azure.openai.com/".
azure_deployment: The deployment of the model, usually the model name.
organization: Your organization ID, defaults to None. For help, see Setting up your organization.
streaming_callback: A callback function called when a new token is received from the stream. It accepts StreamingChunk as an argument.
timeout: Timeout for OpenAI client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable, or 30 seconds.
max_retries: Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
temperature: What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
previous_response_id: The ID of the previous response. Use this to create multi-turn conversations.
text_format: A Pydantic model that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). For details, see the OpenAI Structured Outputs documentation.
text: A JSON schema that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). Notes:
- Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
- If both are provided, text_format takes precedence and json schema passed to text is ignored.
- Currently, this component doesn't support streaming for structured outputs.
- Older models only support basic version of structured outputs through {"type": "json_object"}. For detailed information on JSON mode, see the OpenAI Structured Outputs documentation.
reasoning: A dictionary of parameters for reasoning. For example:
- summary: The summary of the reasoning.
- effort: The level of effort to put into the reasoning. Can be low, medium or high.
- generate_summary: Whether to generate a summary of the reasoning. Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. For details, see the OpenAI Reasoning documentation.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

AzureOpenAIResponsesChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

AzureOpenAIResponsesChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str,
                              Any]) -> "AzureOpenAIResponsesChatGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

AzureOpenAIResponsesChatGenerator.warm_up

python

def warm_up()

Warm up the OpenAI responses chat generator.

This will warm up the tools registered in the chat generator. This method is idempotent and will only warm up the tools once.

AzureOpenAIResponsesChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        *,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None,
        tools: Optional[Union[ToolsType, list[dict]]] = None,
        tools_strict: Optional[bool] = None)

Invokes response generation based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: The tools that the model can use to prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a mixed list of Haystack Tool objects and Haystack Toolset. Or you can pass a dictionary of OpenAI/MCP tool definitions. Note: You cannot pass OpenAI/MCP tools and Haystack tools together. For details on tool support, see OpenAI documentation.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to False, the model may not exactly follow the schema provided in the parameters field of the tool definition. In Response API, tool calls are strict by default. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

AzureOpenAIResponsesChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(messages: list[ChatMessage],
                    *,
                    streaming_callback: Optional[StreamingCallbackT] = None,
                    generation_kwargs: Optional[dict[str, Any]] = None,
                    tools: Optional[Union[ToolsType, list[dict]]] = None,
                    tools_strict: Optional[bool] = None)

Asynchronously invokes response generation based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in async code.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream. Must be a coroutine.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a list of mixed list of Haystack Tool objects and Haystack Toolset. Or you can pass a dictionary of OpenAI/MCP tool definitions. Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

Module chat/hugging_face_local

default_tool_parser

python

def default_tool_parser(text: str) -> Optional[list[ToolCall]]

Default implementation for parsing tool calls from model output text.

Uses DEFAULT_TOOL_PATTERN to extract tool calls.

Arguments:

text: The text to parse for tool calls.

Returns:

A list containing a single ToolCall if a valid tool call is found, None otherwise.

HuggingFaceLocalChatGenerator

Generates chat responses using models from Hugging Face that run locally.

Use this component with chat-based models, such as HuggingFaceH4/zephyr-7b-beta or meta-llama/Llama-2-7b-chat-hf. LLMs running locally may need powerful hardware.

Usage example

python

from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
from haystack.dataclasses import ChatMessage

generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta")
generator.warm_up()
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
print(generator.run(messages))

{'replies':
    [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
    "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
    with the interaction between computers and human language. It enables computers to understand, interpret, and
    generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
    analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
    process and derive meaning from human language, improving communication between humans and machines.")],
    _name=None,
    _meta={'finish_reason': 'stop', 'index': 0, 'model':
          'mistralai/Mistral-7B-Instruct-v0.2',
          'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
          ]
}

HuggingFaceLocalChatGenerator.init

python

def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
             task: Optional[Literal["text-generation",
                                    "text2text-generation"]] = None,
             device: Optional[ComponentDevice] = None,
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             chat_template: Optional[str] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
             stop_words: Optional[list[str]] = None,
             streaming_callback: Optional[StreamingCallbackT] = None,
             tools: Optional[ToolsType] = None,
             tool_parsing_function: Optional[Callable[
                 [str], Optional[list[ToolCall]]]] = None,
             async_executor: Optional[ThreadPoolExecutor] = None) -> None

Initializes the HuggingFaceLocalChatGenerator component.

Arguments:

model: The Hugging Face text generation model name or path, for example, mistralai/Mistral-7B-Instruct-v0.2 or TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ. The model must be a chat model supporting the ChatML messaging format. If the model is specified in huggingface_pipeline_kwargs, this parameter is ignored.
task: The task for the Hugging Face pipeline. Possible options:
text-generation: Supported by decoder models, like GPT.
text2text-generation: Supported by encoder-decoder models, like T5. If the task is specified in huggingface_pipeline_kwargs, this parameter is ignored. If not specified, the component calls the Hugging Face API to infer the task from the model name.
device: The device for loading the model. If None, automatically selects the default device. If a device or device map is specified in huggingface_pipeline_kwargs, it overrides this parameter.
token: The token to use as HTTP bearer authorization for remote files. If the token is specified in huggingface_pipeline_kwargs, this parameter is ignored.
chat_template: Specifies an optional Jinja template for formatting chat messages. Most high-quality chat models have their own templates, but for models without this feature or if you prefer a custom template, use this parameter.
generation_kwargs: A dictionary with keyword arguments to customize text generation. Some examples: max_length, max_new_tokens, temperature, top_k, top_p. See Hugging Face's documentation for more information:
- customize-text-generation
- GenerationConfig The only generation_kwargs set by default is max_new_tokens, which is set to 512 tokens.
huggingface_pipeline_kwargs: Dictionary with keyword arguments to initialize the Hugging Face pipeline for text generation. These keyword arguments provide fine-grained control over the Hugging Face pipeline. In case of duplication, these kwargs override model, task, device, and token init parameters. For kwargs, see Hugging Face documentation. In this dictionary, you can also include model_kwargs to specify the kwargs for model initialization
stop_words: A list of stop words. If the model generates a stop word, the generation stops. If you provide this parameter, don't specify the stopping_criteria in generation_kwargs. For some chat models, the output includes both the new text and the original prompt. In these cases, make sure your prompt has no stop words.
streaming_callback: An optional callable for handling streaming responses.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
tool_parsing_function: A callable that takes a string and returns a list of ToolCall objects or None. If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
async_executor: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be initialized and used

HuggingFaceLocalChatGenerator.del

python

def __del__() -> None

Cleanup when the instance is being destroyed.

HuggingFaceLocalChatGenerator.shutdown

python

def shutdown() -> None

Explicitly shutdown the executor if we own it.

HuggingFaceLocalChatGenerator.warm_up

python

def warm_up() -> None

Initializes the component and warms up tools if provided.

HuggingFaceLocalChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

HuggingFaceLocalChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator"

Deserializes the component from a dictionary.

Arguments:

data: The dictionary to deserialize from.

Returns:

The deserialized component.

HuggingFaceLocalChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        generation_kwargs: Optional[dict[str, Any]] = None,
        streaming_callback: Optional[StreamingCallbackT] = None,
        tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]

Invoke text generation inference based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage objects representing the input messages.
generation_kwargs: Additional keyword arguments for text generation.
streaming_callback: An optional callable for handling streaming responses.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.

Returns:

A dictionary with the following keys:

replies: A list containing the generated responses as ChatMessage instances.

HuggingFaceLocalChatGenerator.create_message

python

def create_message(text: str,
                   index: int,
                   tokenizer: Union["PreTrainedTokenizer",
                                    "PreTrainedTokenizerFast"],
                   prompt: str,
                   generation_kwargs: dict[str, Any],
                   parse_tool_calls: bool = False) -> ChatMessage

Create a ChatMessage instance from the provided text, populated with metadata.

Arguments:

text: The generated text.
index: The index of the generated text.
tokenizer: The tokenizer used for generation.
prompt: The prompt used for generation.
generation_kwargs: The generation parameters.
parse_tool_calls: Whether to attempt parsing tool calls from the text.

Returns:

A ChatMessage instance.

HuggingFaceLocalChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(
        messages: list[ChatMessage],
        generation_kwargs: Optional[dict[str, Any]] = None,
        streaming_callback: Optional[StreamingCallbackT] = None,
        tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]

Asynchronously invokes text generation inference based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in an async code.

Arguments:

messages: A list of ChatMessage objects representing the input messages.
generation_kwargs: Additional keyword arguments for text generation.
streaming_callback: An optional callable for handling streaming responses.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.

Returns:

A dictionary with the following keys:

replies: A list containing the generated responses as ChatMessage instances.

Module chat/hugging_face_api

HuggingFaceAPIChatGenerator

Completes chats using Hugging Face APIs.

HuggingFaceAPIChatGenerator uses the ChatMessage format for input and output. Use it to generate text with Hugging Face APIs:

Usage examples

With the serverless inference API (Inference Providers) - free tier available

python

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

# the api_type can be expressed using the HFGenerationAPIType enum or as a string
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
api_type = "serverless_inference_api" # this is equivalent to the above

generator = HuggingFaceAPIChatGenerator(api_type=api_type,
                                        api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
                                                    "provider": "together"},
                                        token=Secret.from_token("<your-api-key>"))

result = generator.run(messages)
print(result)

With the serverless inference API (Inference Providers) and text+image input

python

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage, ImageContent
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

# Create an image from file path, URL, or base64
image = ImageContent.from_file_path("path/to/your/image.jpg")

# Create a multimodal message with both text and image
messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]

generator = HuggingFaceAPIChatGenerator(
    api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
    api_params={
        "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
        "provider": "hyperbolic"
    },
    token=Secret.from_token("<your-api-key>")
)

result = generator.run(messages)
print(result)

With paid inference endpoints

python

from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
                                        api_params={"url": "<your-inference-endpoint-url>"},
                                        token=Secret.from_token("<your-api-key>"))

result = generator.run(messages)
print(result)

#### With self-hosted text generation inference

```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
                                        api_params={"url": "http://localhost:8080"})

result = generator.run(messages)
print(result)

HuggingFaceAPIChatGenerator.init

python

def __init__(api_type: Union[HFGenerationAPIType, str],
             api_params: dict[str, str],
             token: Optional[Secret] = Secret.from_env_var(
                 ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
             generation_kwargs: Optional[dict[str, Any]] = None,
             stop_words: Optional[list[str]] = None,
             streaming_callback: Optional[StreamingCallbackT] = None,
             tools: Optional[ToolsType] = None)

Initialize the HuggingFaceAPIChatGenerator instance.

Arguments:

api_type: The type of Hugging Face API to use. Available types:
text_generation_inference: See TGI.
inference_endpoints: See Inference Endpoints.
serverless_inference_api: See Serverless Inference API - Inference Providers.
api_params: A dictionary with the following keys:
model: Hugging Face model ID. Required when api_type is SERVERLESS_INFERENCE_API.
provider: Provider name. Recommended when api_type is SERVERLESS_INFERENCE_API.
url: URL of the inference endpoint. Required when api_type is INFERENCE_ENDPOINTS or TEXT_GENERATION_INFERENCE.
Other parameters specific to the chosen API type, such as timeout, headers, etc.
token: The Hugging Face token to use as HTTP bearer authorization. Check your HF token in your account settings.
generation_kwargs: A dictionary with keyword arguments to customize text generation. Some examples: max_tokens, temperature, top_p. For details, see Hugging Face chat_completion documentation.
stop_words: An optional list of strings representing the stop words.
streaming_callback: An optional callable for handling streaming responses.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. The chosen model should support tool/function calling, according to the model card. Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience unexpected behavior.

HuggingFaceAPIChatGenerator.warm_up

python

def warm_up()

Warm up the Hugging Face API chat generator.

This will warm up the tools registered in the chat generator. This method is idempotent and will only warm up the tools once.

HuggingFaceAPIChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

A dictionary containing the serialized component.

HuggingFaceAPIChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator"

Deserialize this component from a dictionary.

HuggingFaceAPIChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        generation_kwargs: Optional[dict[str, Any]] = None,
        tools: Optional[ToolsType] = None,
        streaming_callback: Optional[StreamingCallbackT] = None)

Invoke the text generation inference based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage objects representing the input messages.
generation_kwargs: Additional keyword arguments for text generation.
tools: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a list of Tool objects or a Toolset instance.
streaming_callback: An optional callable for handling streaming responses. If set, it will override the streaming_callback parameter set during component initialization.

Returns:

A dictionary with the following keys:

replies: A list containing the generated responses as ChatMessage objects.

HuggingFaceAPIChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(messages: list[ChatMessage],
                    generation_kwargs: Optional[dict[str, Any]] = None,
                    tools: Optional[ToolsType] = None,
                    streaming_callback: Optional[StreamingCallbackT] = None)

Asynchronously invokes the text generation inference based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in an async code.

Arguments:

messages: A list of ChatMessage objects representing the input messages.
generation_kwargs: Additional keyword arguments for text generation.
tools: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a list of Tool objects or a Toolset instance.
streaming_callback: An optional callable for handling streaming responses. If set, it will override the streaming_callback parameter set during component initialization.

Returns:

A dictionary with the following keys:

replies: A list containing the generated responses as ChatMessage objects.

Module chat/openai

OpenAIChatGenerator

Completes chats using OpenAI's large language models (LLMs).

It works with the gpt-4 and o-series models and supports streaming responses from OpenAI API. It uses ChatMessage format in input and output.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = OpenAIChatGenerator()
response = client.run(messages)
print(response)

Output:

{'replies':
    [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
    [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
        that focuses on enabling computers to understand, interpret, and generate human language in
        a way that is meaningful and useful.")],
     _name=None,
     _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
     'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
    ]
}

OpenAIChatGenerator.init

python

def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "gpt-4o-mini",
             streaming_callback: Optional[StreamingCallbackT] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             tools: Optional[ToolsType] = None,
             tools_strict: bool = False,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Creates an instance of OpenAIChatGenerator. Unless specified otherwise in model, uses OpenAI's gpt-4o-mini

Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' environment variables to override the timeout and max_retries parameters respectively in the OpenAI client.

Arguments:

api_key: The OpenAI API key. You can set it with an environment variable OPENAI_API_KEY, or pass with this parameter during initialization.
model: The name of the model to use.
streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
api_base_url: An optional base URL.
organization: Your organization ID, defaults to None. See production best practices.
generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
max_completion_tokens: An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
temperature: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
n: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.
stop: One or more sequences after which the LLM should stop generating tokens.
presence_penalty: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.
frequency_penalty: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.
logit_bias: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
response_format: A JSON schema or a Pydantic model that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). For details, see the OpenAI Structured Outputs documentation. Notes:
- This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. Older models only support basic version of structured outputs through {"type": "json_object"}. For detailed information on JSON mode, see the OpenAI Structured Outputs documentation.
- For structured outputs with streaming, the response_format must be a JSON schema and not a Pydantic model.
timeout: Timeout for OpenAI client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable, or 30 seconds.
max_retries: Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

OpenAIChatGenerator.warm_up

python

def warm_up()

Warm up the OpenAI chat generator.

This will warm up the tools registered in the chat generator. This method is idempotent and will only warm up the tools once.

OpenAIChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

OpenAIChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

OpenAIChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None,
        *,
        tools: Optional[ToolsType] = None,
        tools_strict: Optional[bool] = None)

Invokes chat completion based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

OpenAIChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(messages: list[ChatMessage],
                    streaming_callback: Optional[StreamingCallbackT] = None,
                    generation_kwargs: Optional[dict[str, Any]] = None,
                    *,
                    tools: Optional[ToolsType] = None,
                    tools_strict: Optional[bool] = None)

Asynchronously invokes chat completion based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in async code.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream. Must be a coroutine.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

Module chat/openai_responses

OpenAIResponsesChatGenerator

Completes chats using OpenAI's Responses API.

It works with the gpt-4 and o-series models and supports streaming responses from OpenAI API. It uses ChatMessage format in input and output.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

python

from haystack.components.generators.chat import OpenAIResponsesChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
response = client.run(messages)
print(response)

OpenAIResponsesChatGenerator.init

python

def __init__(*,
             api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "gpt-5-mini",
             streaming_callback: Optional[StreamingCallbackT] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             generation_kwargs: Optional[dict[str, Any]] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             tools: Optional[Union[ToolsType, list[dict]]] = None,
             tools_strict: bool = False,
             http_client_kwargs: Optional[dict[str, Any]] = None)

Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.

Arguments:

api_key: The OpenAI API key. You can set it with an environment variable OPENAI_API_KEY, or pass with this parameter during initialization.
model: The name of the model to use.
streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
api_base_url: An optional base URL.
organization: Your organization ID, defaults to None. See production best practices.
generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
temperature: What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
previous_response_id: The ID of the previous response. Use this to create multi-turn conversations.
text_format: A Pydantic model that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). For details, see the OpenAI Structured Outputs documentation.
text: A JSON schema that enforces the structure of the model's response. If provided, the output will always be validated against this format (unless the model returns a tool call). Notes:
- Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
- If both are provided, text_format takes precedence and json schema passed to text is ignored.
- Currently, this component doesn't support streaming for structured outputs.
- Older models only support basic version of structured outputs through {"type": "json_object"}. For detailed information on JSON mode, see the OpenAI Structured Outputs documentation.
reasoning: A dictionary of parameters for reasoning. For example:
- summary: The summary of the reasoning.
- effort: The level of effort to put into the reasoning. Can be low, medium or high.
- generate_summary: Whether to generate a summary of the reasoning. Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. For details, see the OpenAI Reasoning documentation.
timeout: Timeout for OpenAI client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable, or 30 seconds.
max_retries: Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
tools: The tools that the model can use to prepare calls. This parameter can accept either a mixed list of Haystack Tool objects and Haystack Toolset. Or you can pass a dictionary of OpenAI/MCP tool definitions. Note: You cannot pass OpenAI/MCP tools and Haystack tools together. For details on tool support, see OpenAI documentation.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to False, the model may not exactly follow the schema provided in the parameters field of the tool definition. In Response API, tool calls are strict by default.
http_client_kwargs: A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation.

OpenAIResponsesChatGenerator.warm_up

python

def warm_up()

Warm up the OpenAI responses chat generator.

This will warm up the tools registered in the chat generator. This method is idempotent and will only warm up the tools once.

OpenAIResponsesChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

OpenAIResponsesChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "OpenAIResponsesChatGenerator"

Deserialize this component from a dictionary.

Arguments:

data: The dictionary representation of this component.

Returns:

The deserialized component instance.

OpenAIResponsesChatGenerator.run

python

@component.output_types(replies=list[ChatMessage])
def run(messages: list[ChatMessage],
        *,
        streaming_callback: Optional[StreamingCallbackT] = None,
        generation_kwargs: Optional[dict[str, Any]] = None,
        tools: Optional[Union[ToolsType, list[dict]]] = None,
        tools_strict: Optional[bool] = None)

Invokes response generation based on the provided messages and generation parameters.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: The tools that the model can use to prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a mixed list of Haystack Tool objects and Haystack Toolset. Or you can pass a dictionary of OpenAI/MCP tool definitions. Note: You cannot pass OpenAI/MCP tools and Haystack tools together. For details on tool support, see OpenAI documentation.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to False, the model may not exactly follow the schema provided in the parameters field of the tool definition. In Response API, tool calls are strict by default. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

OpenAIResponsesChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage])
async def run_async(messages: list[ChatMessage],
                    *,
                    streaming_callback: Optional[StreamingCallbackT] = None,
                    generation_kwargs: Optional[dict[str, Any]] = None,
                    tools: Optional[Union[ToolsType, list[dict]]] = None,
                    tools_strict: Optional[bool] = None)

Asynchronously invokes response generation based on the provided messages and generation parameters.

This is the asynchronous version of the run method. It has the same parameters and return values but can be used with await in async code.

Arguments:

messages: A list of ChatMessage instances representing the input messages.
streaming_callback: A callback function that is called when a new token is received from the stream. Must be a coroutine.
generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
tools: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the tools parameter set during component initialization. This parameter can accept either a list of mixed list of Haystack Tool objects and Haystack Toolset. Or you can pass a dictionary of OpenAI/MCP tool definitions. Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.

Returns:

A dictionary with the following key:

replies: A list containing the generated responses as ChatMessage instances.

Module chat/fallback

FallbackChatGenerator

A chat generator wrapper that tries multiple chat generators sequentially.

It forwards all parameters transparently to the underlying chat generators and returns the first successful result. Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. If all chat generators fail, it raises a RuntimeError with details.

Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only work correctly if the underlying chat generators implement proper timeout handling and raise exceptions when timeouts occur. For predictable latency guarantees, ensure your chat generators:

Support a timeout parameter in their initialization
Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded

Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., timeout=30) typically applies to all connection phases: connection setup, read, write, and pool. For streaming responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for receiving the complete response.

Failover is automatically triggered when a generator raises any exception, including:

Timeout errors (if the generator implements and raises them)
Rate limit errors (429)
Authentication errors (401)
Context length errors (400)
Server errors (500+)
Any other exception

FallbackChatGenerator.init

python

def __init__(chat_generators: list[ChatGenerator])

Creates an instance of FallbackChatGenerator.

Arguments:

chat_generators: A non-empty list of chat generator components to try in order.

FallbackChatGenerator.to_dict

python

def to_dict() -> dict[str, Any]

Serialize the component, including nested chat generators when they support serialization.

FallbackChatGenerator.from_dict

python

@classmethod
def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator

Rebuild the component from a serialized representation, restoring nested chat generators.

FallbackChatGenerator.warm_up

python

def warm_up() -> None

Warm up all underlying chat generators.

This method calls warm_up() on each underlying generator that supports it.

FallbackChatGenerator.run

python

@component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
def run(
    messages: list[ChatMessage],
    generation_kwargs: Union[dict[str, Any], None] = None,
    tools: Optional[ToolsType] = None,
    streaming_callback: Union[StreamingCallbackT,
                              None] = None) -> dict[str, Any]

Execute chat generators sequentially until one succeeds.

Arguments:

messages: The conversation history as a list of ChatMessage instances.
generation_kwargs: Optional parameters for the chat generator (e.g., temperature, max_tokens).
tools: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
streaming_callback: Optional callable for handling streaming responses.

Raises:

RuntimeError: If all chat generators fail.

Returns:

A dictionary with:

"replies": Generated ChatMessage instances from the first successful generator.
"meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, total_attempts, failed_chat_generators, plus any metadata from the successful generator.

FallbackChatGenerator.run_async

python

@component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
async def run_async(
    messages: list[ChatMessage],
    generation_kwargs: Union[dict[str, Any], None] = None,
    tools: Optional[ToolsType] = None,
    streaming_callback: Union[StreamingCallbackT,
                              None] = None) -> dict[str, Any]

Asynchronously execute chat generators sequentially until one succeeds.

Arguments:

messages: The conversation history as a list of ChatMessage instances.
generation_kwargs: Optional parameters for the chat generator (e.g., temperature, max_tokens).
tools: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
streaming_callback: Optional callable for handling streaming responses.

Raises:

RuntimeError: If all chat generators fail.

Returns:

A dictionary with:

"replies": Generated ChatMessage instances from the first successful generator.
"meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, total_attempts, failed_chat_generators, plus any metadata from the successful generator.

Module azure​

AzureOpenAIGenerator​

Usage example​

AzureOpenAIGenerator.__init__​

AzureOpenAIGenerator.to_dict​

AzureOpenAIGenerator.from_dict​

AzureOpenAIGenerator.run​

Module hugging_face_local​

HuggingFaceLocalGenerator​

Usage example​

HuggingFaceLocalGenerator.__init__​

HuggingFaceLocalGenerator.warm_up​

HuggingFaceLocalGenerator.to_dict​

HuggingFaceLocalGenerator.from_dict​

HuggingFaceLocalGenerator.run​

Module hugging_face_api​

HuggingFaceAPIGenerator​

Usage examples​

With Hugging Face Inference Endpoints​

With self-hosted text generation inference​

With the free serverless inference API​

HuggingFaceAPIGenerator.__init__​

HuggingFaceAPIGenerator.to_dict​

HuggingFaceAPIGenerator.from_dict​

HuggingFaceAPIGenerator.run​

Module openai​

OpenAIGenerator​

Usage example​

OpenAIGenerator.__init__​

OpenAIGenerator.to_dict​

OpenAIGenerator.from_dict​

OpenAIGenerator.run​

Module openai_dalle​

DALLEImageGenerator​

Usage example​

DALLEImageGenerator.__init__​

DALLEImageGenerator.warm_up​

DALLEImageGenerator.run​

DALLEImageGenerator.to_dict​

DALLEImageGenerator.from_dict​

Module chat/azure​

AzureOpenAIChatGenerator​

Usage example​

AzureOpenAIChatGenerator.__init__​

AzureOpenAIChatGenerator.warm_up​

AzureOpenAIChatGenerator.to_dict​

AzureOpenAIChatGenerator.from_dict​

AzureOpenAIChatGenerator.run​

AzureOpenAIChatGenerator.run_async​

Module chat/azure_responses​

AzureOpenAIResponsesChatGenerator​

Usage example​

AzureOpenAIResponsesChatGenerator.__init__​

AzureOpenAIResponsesChatGenerator.to_dict​

AzureOpenAIResponsesChatGenerator.from_dict​

AzureOpenAIResponsesChatGenerator.warm_up​

AzureOpenAIResponsesChatGenerator.run​

AzureOpenAIResponsesChatGenerator.run_async​

Module chat/hugging_face_local​

default_tool_parser​

HuggingFaceLocalChatGenerator​

Usage example​

HuggingFaceLocalChatGenerator.__init__​

HuggingFaceLocalChatGenerator.__del__​

HuggingFaceLocalChatGenerator.shutdown​

HuggingFaceLocalChatGenerator.warm_up​

HuggingFaceLocalChatGenerator.to_dict​

HuggingFaceLocalChatGenerator.from_dict​

HuggingFaceLocalChatGenerator.run​

HuggingFaceLocalChatGenerator.create_message​

HuggingFaceLocalChatGenerator.run_async​

Module chat/hugging_face_api​

HuggingFaceAPIChatGenerator​

Usage examples​

With the serverless inference API (Inference Providers) - free tier available​

With the serverless inference API (Inference Providers) and text+image input​

With paid inference endpoints​

HuggingFaceAPIChatGenerator.__init__​

HuggingFaceAPIChatGenerator.warm_up​

HuggingFaceAPIChatGenerator.to_dict​

Module azure

AzureOpenAIGenerator

Usage example

AzureOpenAIGenerator.init

AzureOpenAIGenerator.to_dict

AzureOpenAIGenerator.from_dict

AzureOpenAIGenerator.run

Module hugging_face_local

HuggingFaceLocalGenerator

Usage example

HuggingFaceLocalGenerator.init

HuggingFaceLocalGenerator.warm_up

HuggingFaceLocalGenerator.to_dict

HuggingFaceLocalGenerator.from_dict

HuggingFaceLocalGenerator.run

Module hugging_face_api

HuggingFaceAPIGenerator

Usage examples

With Hugging Face Inference Endpoints

With self-hosted text generation inference

With the free serverless inference API

HuggingFaceAPIGenerator.init

HuggingFaceAPIGenerator.to_dict

HuggingFaceAPIGenerator.from_dict

HuggingFaceAPIGenerator.run

Module openai

OpenAIGenerator

Usage example

OpenAIGenerator.init

OpenAIGenerator.to_dict

OpenAIGenerator.from_dict

OpenAIGenerator.run

Module openai_dalle

DALLEImageGenerator

Usage example

DALLEImageGenerator.init

DALLEImageGenerator.warm_up

DALLEImageGenerator.run

DALLEImageGenerator.to_dict

DALLEImageGenerator.from_dict

Module chat/azure

AzureOpenAIChatGenerator

Usage example

AzureOpenAIChatGenerator.init

AzureOpenAIChatGenerator.warm_up

AzureOpenAIChatGenerator.to_dict

AzureOpenAIChatGenerator.from_dict

AzureOpenAIChatGenerator.run

AzureOpenAIChatGenerator.run_async

Module chat/azure_responses

AzureOpenAIResponsesChatGenerator

Usage example

AzureOpenAIResponsesChatGenerator.init

AzureOpenAIResponsesChatGenerator.to_dict

AzureOpenAIResponsesChatGenerator.from_dict

AzureOpenAIResponsesChatGenerator.warm_up

AzureOpenAIResponsesChatGenerator.run

AzureOpenAIResponsesChatGenerator.run_async

Module chat/hugging_face_local

default_tool_parser

HuggingFaceLocalChatGenerator

Usage example

HuggingFaceLocalChatGenerator.init

HuggingFaceLocalChatGenerator.del

HuggingFaceLocalChatGenerator.shutdown

HuggingFaceLocalChatGenerator.warm_up

HuggingFaceLocalChatGenerator.to_dict

HuggingFaceLocalChatGenerator.from_dict

HuggingFaceLocalChatGenerator.run

HuggingFaceLocalChatGenerator.create_message

HuggingFaceLocalChatGenerator.run_async

Module chat/hugging_face_api

HuggingFaceAPIChatGenerator

Usage examples

With the serverless inference API (Inference Providers) - free tier available

With the serverless inference API (Inference Providers) and text+image input

With paid inference endpoints

HuggingFaceAPIChatGenerator.init

HuggingFaceAPIChatGenerator.warm_up

HuggingFaceAPIChatGenerator.to_dict