FallbackChatGenerator
A ChatGenerator wrapper that tries multiple Chat Generators sequentially until one succeeds.
Most common position in a pipeline | After a ChatPromptBuilder |
Mandatory init variables | "chat_generators": A non-empty list of Chat Generator components to try in order |
Mandatory run variables | "messages": A list of ChatMessage objects representing the chat |
Output variables | "replies": Generated ChatMessage instances from the first successful generator "meta": Execution metadata including successful generator details |
API reference | Generators |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/chat/fallback.py |
Overview
FallbackChatGenerator
is a wrapper component that tries multiple Chat Generators sequentially until one succeeds. If a Generator fails, the component tries the next one in the list. This handles provider outages, rate limits, and other transient failures.
The component forwards all parameters to the underlying Chat Generators and returns the first successful result. When a Generator raises any exception, the component tries the next Generator. This includes timeout errors, rate limit errors (429), authentication errors (401), context length errors (400), server errors (500+), and any other exception.
The component returns execution metadata including which Generator succeeded, how many attempts were made, and which Generators failed. All parameters (messages
, generation_kwargs
, tools
, streaming_callback
) are forwarded to the underlying Generators.
Timeout enforcement is delegated to the underlying Chat Generators. To control latency, configure your Chat Generators with a timeout
parameter. Chat Generators like OpenAI, Anthropic, and Cohere support timeout parameters that raise exceptions when exceeded.
Monitoring and Telemetry
The meta
dictionary in the output contains useful information for monitoring:
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
# Set up generators
primary = OpenAIChatGenerator(model="gpt-4o")
backup = OpenAIChatGenerator(model="gpt-4o-mini")
generator = FallbackChatGenerator(chat_generators=[primary, backup])
# Run and inspect metadata
result = generator.run(messages=[ChatMessage.from_user("Hello")])
meta = result["meta"]
print(f"Successful generator index: {meta['successful_chat_generator_index']}") # 0 for first, 1 for second, etc.
print(f"Successful generator class: {meta['successful_chat_generator_class']}") # e.g., "OpenAIChatGenerator"
print(f"Total attempts made: {meta['total_attempts']}") # How many Generators were tried
print(f"Failed generators: {meta['failed_chat_generators']}") # List of failed Generator names
You can use this metadata to:
- Track which Generators are being used most frequently
- Monitor failure rates for each Generator
- Set up alerts when fallbacks occur
- Adjust Generator ordering based on success rates
Streaming
FallbackChatGenerator
supports streaming through the streaming_callback
parameter. The callback is passed directly to the underlying Generators.
Usage
On its own
Basic usage with fallback from a primary to a backup model:
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
# Create primary and backup generators
primary = OpenAIChatGenerator(model="gpt-4o", timeout=30)
backup = OpenAIChatGenerator(model="gpt-4o-mini", timeout=30)
# Wrap them in a FallbackChatGenerator
generator = FallbackChatGenerator(chat_generators=[primary, backup])
# Use it like any other Chat Generator
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
result = generator.run(messages=messages)
print(result["replies"][0].text)
print(f"Successful generator: {result['meta']['successful_chat_generator_class']}")
print(f"Total attempts: {result['meta']['total_attempts']}")
>> Natural Language Processing (NLP) is a field of artificial intelligence that
>> focuses on the interaction between computers and humans through natural language...
>> Successful generator: OpenAIChatGenerator
>> Total attempts: 1
With multiple providers:
from haystack.components.generators.chat import (
FallbackChatGenerator,
OpenAIChatGenerator,
AzureOpenAIChatGenerator
)
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
# Create generators from different providers
openai_gen = OpenAIChatGenerator(
model="gpt-4o-mini",
api_key=Secret.from_env_var("OPENAI_API_KEY"),
timeout=30
)
azure_gen = AzureOpenAIChatGenerator(
azure_endpoint="<Your Azure endpoint>",
api_key=Secret.from_env_var("AZURE_OPENAI_API_KEY"),
azure_deployment="gpt-4o-mini",
timeout=30
)
# Fallback will try OpenAI first, then Azure
generator = FallbackChatGenerator(chat_generators=[openai_gen, azure_gen])
messages = [ChatMessage.from_user("Explain quantum computing briefly.")]
result = generator.run(messages=messages)
print(result["replies"][0].text)
With streaming:
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
primary = OpenAIChatGenerator(model="gpt-4o")
backup = OpenAIChatGenerator(model="gpt-4o-mini")
generator = FallbackChatGenerator(
chat_generators=[primary, backup]
)
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
result = generator.run(
messages=messages,
streaming_callback=lambda chunk: print(chunk.content, end="", flush=True)
)
print("\n", result["meta"])
In a Pipeline
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
# Create primary and backup generators with timeouts
primary = OpenAIChatGenerator(model="gpt-4o", timeout=30)
backup = OpenAIChatGenerator(model="gpt-4o-mini", timeout=30)
# Wrap in fallback
fallback_generator = FallbackChatGenerator(chat_generators=[primary, backup])
# Build pipeline
prompt_builder = ChatPromptBuilder()
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", fallback_generator)
pipe.connect("prompt_builder.prompt", "llm.messages")
# Run pipeline
messages = [
ChatMessage.from_system("You are a helpful assistant that provides concise answers."),
ChatMessage.from_user("Tell me about {{location}}")
]
result = pipe.run(
data={
"prompt_builder": {
"template": messages,
"template_variables": {"location": "Paris"}
}
}
)
print(result["llm"]["replies"][0].text)
print(f"Generator used: {result['llm']['meta']['successful_chat_generator_class']}")
Error Handling
If all Generators fail, FallbackChatGenerator
raises a RuntimeError
with details about which Generators failed and the last error encountered:
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
# Create generators with invalid credentials to demonstrate error handling
primary = OpenAIChatGenerator(api_key=Secret.from_token("invalid-key-1"))
backup = OpenAIChatGenerator(api_key=Secret.from_token("invalid-key-2"))
generator = FallbackChatGenerator(chat_generators=[primary, backup])
try:
result = generator.run(messages=[ChatMessage.from_user("Hello")])
except RuntimeError as e:
print(f"All generators failed: {e}")
# Output: All 2 chat generators failed. Last error: ... Failed chat generators: [OpenAIChatGenerator, OpenAIChatGenerator]
Updated about 4 hours ago