DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Generators

Enables text generation using LLMs.

Module haystack_experimental.components.generators.chat.openai

OpenAIChatGenerator

Completes chats using OpenAI's large language models (LLMs).

It works with the gpt-4 and gpt-3.5-turbo models and supports streaming responses
from OpenAI API. It uses ChatMessage
format in input and output.

You can customize how the text is generated by passing parameters to the
OpenAI API. Use the **generation_kwargs argument when you initialize
the component or when you run it. Any parameter that works with
openai.ChatCompletion.create will work here too.

For details on OpenAI API parameters, see
OpenAI documentation.

Usage example

from haystack_experimental.components.generators.chat import OpenAIChatGenerator
from haystack_experimental.dataclasses import ChatMessage

messages = [ChatMessage.from_user("What's Natural Language Processing?")]

client = OpenAIChatGenerator()
response = client.run(messages)
print(response)

Output:

{'replies': [
    ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>,
                _content=[TextContent(text='Natural Language Processing (NLP) is a field of artificial ...')],
                _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
                    'usage': {'completion_tokens': 71, 'prompt_tokens': 13, 'total_tokens': 84}}
                )
            ]
}

OpenAIChatGenerator.__init__

def __init__(
        api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
        model: str = "gpt-4o-mini",
        streaming_callback: Optional[Union[StreamingCallbackT,
                                           AsyncStreamingCallbackT]] = None,
        api_base_url: Optional[str] = None,
        organization: Optional[str] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
        timeout: Optional[float] = None,
        max_retries: Optional[int] = None,
        tools: Optional[List[Tool]] = None,
        tools_strict: bool = False)

Creates an instance of OpenAIChatGenerator.

Arguments:

  • api_key: The OpenAI API key.
  • model: The name of the model to use.
  • streaming_callback: A callback function that is called when a new token is received from the stream.
    The callback function accepts StreamingChunk
    as an argument. Must be a coroutine if the component is used in an async pipeline.
  • api_base_url: An optional base URL.
  • organization: Your organization ID. See production best practices.
  • generation_kwargs: Other parameters to use for the model. These parameters are sent directly to the OpenAI endpoint.
    See OpenAI documentation for more details.
    Some of the supported parameters:
  • max_tokens: The maximum number of tokens the output text can have.
  • temperature: What sampling temperature to use. Higher values mean the model will take more risks.
    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model
    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
    comprising the top 10% probability mass are considered.
  • n: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
  • stop: One or more sequences after which the LLM should stop generating tokens.
  • presence_penalty: What penalty to apply if a token is already present at all. Bigger values mean
    the model will be less likely to repeat the same token in the text.
  • frequency_penalty: What penalty to apply if a token has already been generated in the text.
    Bigger values mean the model will be less likely to repeat the same token in the text.
  • logit_bias: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
    values are the bias to add to that token.
  • timeout: Timeout for OpenAI client calls. If not set, it defaults to either the
    OPENAI_TIMEOUT environment variable, or 30 seconds.
  • max_retries: Maximum number of retries to contact OpenAI after an internal error.
    If not set, it defaults to either the OPENAI_MAX_RETRIES environment variable, or set to 5.
  • tools: A list of tools for which the model can prepare calls.
  • tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly
    the schema provided in the parameters field of the tool definition, but this may increase latency.

OpenAIChatGenerator.to_dict

def to_dict() -> Dict[str, Any]

Serialize this component to a dictionary.

Returns:

The serialized component as a dictionary.

OpenAIChatGenerator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "OpenAIChatGenerator"

Deserialize this component from a dictionary.

Arguments:

  • data: The dictionary representation of this component.

Returns:

The deserialized component instance.

OpenAIChatGenerator.run

@component.output_types(replies=List[ChatMessage])
def run(messages: List[ChatMessage],
        streaming_callback: Optional[Union[StreamingCallbackT,
                                           AsyncStreamingCallbackT]] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
        tools: Optional[List[Tool]] = None,
        tools_strict: Optional[bool] = None)

Invokes chat completion based on the provided messages and generation parameters.

Arguments:

  • messages: A list of ChatMessage instances representing the input messages.
  • streaming_callback: A callback function that is called when a new token is received from the stream.
    Cannot be a coroutine.
  • generation_kwargs: Additional keyword arguments for text generation. These parameters will
    override the parameters passed during component initialization.
    For details on OpenAI API parameters, see OpenAI documentation.
  • tools: A list of tools for which the model can prepare calls. If set, it will override the tools parameter set
    during component initialization.
  • tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly
    the schema provided in the parameters field of the tool definition, but this may increase latency.
    If set, it will override the tools_strict parameter set during component initialization.

Returns:

A list containing the generated responses as ChatMessage instances.

OpenAIChatGenerator.run_async

@component.output_types(replies=List[ChatMessage])
async def run_async(
        messages: List[ChatMessage],
        streaming_callback: Optional[Union[StreamingCallbackT,
                                           AsyncStreamingCallbackT]] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
        tools: Optional[List[Tool]] = None,
        tools_strict: Optional[bool] = None)

Invokes chat completion based on the provided messages and generation parameters.

Arguments:

  • messages: A list of ChatMessage instances representing the input messages.
  • streaming_callback: A callback function that is called when a new token is received from the stream.
    Must be a coroutine.
  • generation_kwargs: Additional keyword arguments for text generation. These parameters will
    override the parameters passed during component initialization.
    For details on OpenAI API parameters, see OpenAI documentation.
  • tools: A list of tools for which the model can prepare calls. If set, it will override the tools parameter set
    during component initialization.
  • tools_strict: Whether to enable strict schema adherence for tool calls. If set to True, the model will follow exactly
    the schema provided in the parameters field of the tool definition, but this may increase latency.
    If set, it will override the tools_strict parameter set during component initialization.

Returns:

A list containing the generated responses as ChatMessage instances.