API Reference

Enables text generation using LLMs.



Completes chats using OpenAI's large language models (LLMs).

It works with the gpt-4 and gpt-3.5-turbo models and supports streaming responses from OpenAI API. It uses ChatMessage format in input and output.

You can customize how the text is generated by passing parameters to the OpenAI API. Use the **generation_kwargs argument when you initialize the component or when you run it. Any parameter that works with openai.ChatCompletion.create will work here too.

For details on OpenAI API parameters, see OpenAI documentation.

Usage example

from import OpenAIChatGenerator
from haystack_experimental.dataclasses import ChatMessage

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = OpenAIChatGenerator()
response =


{'replies': [
    ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>,
                _content=[TextContent(text='Natural Language Processing (NLP) is a field of artificial ...')],
                _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
                    'usage': {'completion_tokens': 71, 'prompt_tokens': 13, 'total_tokens': 84}}


def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
             model: str = "gpt-4o-mini",
             streaming_callback: Optional[Callable[[StreamingChunk],
                                                   None]] = None,
             api_base_url: Optional[str] = None,
             organization: Optional[str] = None,
             generation_kwargs: Optional[Dict[str, Any]] = None,
             timeout: Optional[float] = None,
             max_retries: Optional[int] = None,
             tools: Optional[List[Tool]] = None,
             tools_strict: bool = False)

Creates an instance of OpenAIChatGenerator. Unless specified otherwise in model, uses OpenAI's GPT-3.5.

Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' environment variables to override the timeout and max_retries parameters respectively in the OpenAI client.


  • api_key: The OpenAI API key. You can set it with an environment variable OPENAI_API_KEY, or pass with this parameter during initialization.
  • model: The name of the model to use.
  • streaming_callback: A callback function that is called when a new token is received from the stream. The callback function accepts StreamingChunk as an argument.
  • api_base_url: An optional base URL.
  • organization: Your organization ID, defaults to None. See production best practices.
  • generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters:
  • max_tokens: The maximum number of tokens the output text can have.
  • temperature: What sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
  • n: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, it will generate two completions for each of the three prompts, ending up with 6 completions in total.
  • stop: One or more sequences after which the LLM should stop generating tokens.
  • presence_penalty: What penalty to apply if a token is already present at all. Bigger values mean the model will be less likely to repeat the same token in the text.
  • frequency_penalty: What penalty to apply if a token has already been generated in the text. Bigger values mean the model will be less likely to repeat the same token in the text.
  • logit_bias: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the values are the bias to add to that token.
  • timeout: Timeout for OpenAI client calls. If not set, it defaults to either the OPENAI_TIMEOUT environment variable, or 30 seconds.
  • max_retries: Maximum number of retries to contact OpenAI after an internal error. If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
  • tools: A list of tools for which the model can prepare calls.
  • tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency.


def to_dict() -> Dict[str, Any]

Serialize this component to a dictionary.


The serialized component as a dictionary.


def from_dict(cls, data: Dict[str, Any]) -> "OpenAIChatGenerator"

Deserialize this component from a dictionary.


  • data: The dictionary representation of this component.


The deserialized component instance.

def run(messages: List[ChatMessage],
        streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
        tools: Optional[List[Tool]] = None,
        tools_strict: Optional[bool] = None)

Invokes chat completion based on the provided messages and generation parameters.


  • messages: A list of ChatMessage instances representing the input messages.
  • streaming_callback: A callback function that is called when a new token is received from the stream.
  • generation_kwargs: Additional keyword arguments for text generation. These parameters will override the parameters passed during component initialization. For details on OpenAI API parameters, see OpenAI documentation.
  • tools: A list of tools for which the model can prepare calls. If set, it will override the tools parameter set during component initialization.
  • tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly the schema provided in the parameters field of the tool definition, but this may increase latency. If set, it will override the tools_strict parameter set during component initialization.


A list containing the generated responses as ChatMessage instances.