Version: 2.21

AzureOpenAIChatGenerator

This component enables chat completion using OpenAI’s large language models (LLMs) through Azure services.


Most common position in a pipeline	After a ChatPromptBuilder
Mandatory init variables	`api_key`: The Azure OpenAI API key. Can be set with `AZURE_OPENAI_API_KEY` env var. `azure_ad_token`: Microsoft Entra ID token. Can be set with `AZURE_OPENAI_AD_TOKEN` env var.
Mandatory run variables	`messages`: A list of `ChatMessage` objects representing the chat
Output variables	`replies`: A list of alternative replies of the LLM to the input chat
API reference	Generators
GitHub link	https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/chat/azure.py

Overview

AzureOpenAIChatGenerator supports OpenAI models deployed through Azure services. To see the list of supported models, head over to Azure documentation. The default model used with the component is gpt-4o-mini.

To work with Azure components, you will need an Azure OpenAI API key, as well as an Azure OpenAI Endpoint. You can learn more about them in Azure documentation.

The component uses AZURE_OPENAI_API_KEY and AZURE_OPENAI_AD_TOKEN environment variables by default. Otherwise, you can pass api_key and azure_ad_token at initialization:

python

client = AzureOpenAIChatGenerator(azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
                        api_key=Secret.from_token("<your-api-key>"),
                        azure_deployment="<a model name>")

info

We recommend using environment variables instead of initialization parameters.

Then, the component needs a list of ChatMessage objects to operate. ChatMessage is a data class that contains a message, a role (who generated the message, such as user, assistant, system, function), and optional metadata. See the usage section for an example.

You can pass any chat completion parameters that are valid for the openai.ChatCompletion.create method directly to AzureOpenAIChatGenerator using the generation_kwargs parameter, both at initialization and to run() method. For more details on the supported parameters, refer to the Azure documentation.

You can also specify a model for this component through the azure_deployment init parameter.

Structured Output

AzureOpenAIChatGenerator supports structured output generation, allowing you to receive responses in a predictable format. You can use Pydantic models or JSON schemas to define the structure of the output through the response_format parameter in generation_kwargs.

This is useful when you need to extract structured data from text or generate responses that match a specific format.

python

from pydantic import BaseModel
from haystack.components.generators.chat import AzureOpenAIChatGenerator
from haystack.dataclasses import ChatMessage

class NobelPrizeInfo(BaseModel):
    recipient_name: str
    award_year: int
    category: str
    achievement_description: str
    nationality: str

client = AzureOpenAIChatGenerator(
    azure_endpoint="<Your Azure endpoint>",
    azure_deployment="gpt-4o",
    generation_kwargs={"response_format": NobelPrizeInfo}
)

response = client.run(messages=[
    ChatMessage.from_user(
        "In 2021, American scientist David Julius received the Nobel Prize in"
        " Physiology or Medicine for his groundbreaking discoveries on how the human body"
        " senses temperature and touch."
    )
])
print(response["replies"][0].text)

>> {"recipient_name":"David Julius","award_year":2021,"category":"Physiology or Medicine",
>> "achievement_description":"David Julius was awarded for his transformative findings
>> regarding the molecular mechanisms underlying the human body's sense of temperature
>> and touch. Through innovative experiments, he identified specific receptors responsible
>> for detecting heat and mechanical stimuli, ranging from gentle touch to pain-inducing
>> pressure.","nationality":"American"}

Model Compatibility and Limitations

Pydantic models and JSON schemas are supported for latest models starting from GPT-4o.
Older models only support basic JSON mode through {"type": "json_object"}. For details, see OpenAI JSON mode documentation.
Streaming limitation: When using streaming with structured outputs, you must provide a JSON schema instead of a Pydantic model for response_format.
For complete information, check the Azure OpenAI Structured Outputs documentation.

Streaming

You can stream output as it’s generated. Pass a callback to streaming_callback. Use the built-in print_streaming_chunk to print text tokens and tool events (tool calls and tool results).

python

from haystack.components.generators.utils import print_streaming_chunk

## Configure any `Generator` or `ChatGenerator` with a streaming callback
component = SomeGeneratorOrChatGenerator(streaming_callback=print_streaming_chunk)

## If this is a `ChatGenerator`, pass a list of messages:
## from haystack.dataclasses import ChatMessage
## component.run([ChatMessage.from_user("Your question here")])

## If this is a (non-chat) `Generator`, pass a prompt:
## component.run({"prompt": "Your prompt here"})

info

Streaming works only with a single response. If a provider supports multiple candidates, set n=1.

See our Streaming Support docs to learn more how StreamingChunk works and how to write a custom callback.

Give preference to print_streaming_chunk by default. Write a custom callback only if you need a specific transport (for example, SSE/WebSocket) or custom UI formatting.

Usage

On its own

Basic usage:

python

from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import AzureOpenAIChatGenerator
client = AzureOpenAIChatGenerator()
response = client.run(
	  [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
)
print(response)

With streaming:

python

from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat import AzureOpenAIChatGenerator
client = AzureOpenAIChatGenerator(streaming_callback=lambda chunk: print(chunk.content, end="", flush=True))
response = client.run(
	  [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
)
print(response)

With multimodal inputs:

python

from haystack.dataclasses import ChatMessage, ImageContent
from haystack.components.generators.chat import AzureOpenAIChatGenerator

llm = AzureOpenAIChatGenerator(
    azure_endpoint="<Your Azure endpoint>",
    azure_deployment="gpt-4o-mini",
)

image = ImageContent.from_file_path("apple.jpg", detail="low")
user_message = ChatMessage.from_user(content_parts=[
	"What does the image show? Max 5 words.",
	image
	])

response = llm.run([user_message])["replies"][0].text
print(response)

# Fresh red apple on straw.

In a pipeline

python

from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import AzureOpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

## no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
llm = AzureOpenAIChatGenerator()

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
            ChatMessage.from_user("Tell me about {{location}}")]
pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}})

Overview​

Structured Output​

Streaming​

Usage​

On its own​

In a pipeline​

Overview

Structured Output

Streaming

Usage

On its own

In a pipeline