IBM watsonx.ai integration for Haystack
Module haystack_integrations.components.generators.watsonx.generator
WatsonxGenerator
Enables text completions using IBM's watsonx.ai foundation models.
This component extends WatsonxChatGenerator to provide the standard Generator interface that works with prompt strings instead of ChatMessage objects.
The generator works with IBM's foundation models including:
- granite-13b-chat-v2
- llama-2-70b-chat
- llama-3-70b-instruct
- Other watsonx.ai chat models
You can customize the generation behavior by passing parameters to the watsonx.ai API through the
generation_kwargs
argument. These parameters are passed directly to the watsonx.ai inference endpoint.
For details on watsonx.ai API parameters, see IBM watsonx.ai documentation.
Usage example
from haystack_integrations.components.generators.watsonx.generator import WatsonxGenerator
from haystack.utils import Secret
generator = WatsonxGenerator(
api_key=Secret.from_env_var("WATSONX_API_KEY"),
model="ibm/granite-13b-chat-v2",
project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
response = generator.run(
prompt="Explain quantum computing in simple terms",
system_prompt="You are a helpful physics teacher.",
)
print(response)
Output:
{
"replies": ["Quantum computing uses quantum-mechanical phenomena like...."],
"meta": [
{
"model": "ibm/granite-13b-chat-v2",
"project_id": "your-project-id",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 45,
"total_tokens": 57,
},
}
],
}
WatsonxGenerator.__init__
def __init__(*,
api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
model: str = "ibm/granite-3-2b-instruct",
project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
api_base_url: str = "https://us-south.ml.cloud.ibm.com",
system_prompt: str | None = None,
generation_kwargs: dict[str, Any] | None = None,
timeout: float | None = None,
max_retries: int | None = None,
verify: bool | str | None = None,
streaming_callback: StreamingCallbackT | None = None) -> None
Creates an instance of WatsonxGenerator.
Before initializing the component, you can set environment variables:
WATSONX_TIMEOUT
to override the default timeoutWATSONX_MAX_RETRIES
to override the default retry count
Arguments:
api_key
: IBM Cloud API key for watsonx.ai access. Can be set viaWATSONX_API_KEY
environment variable or passed directly.model
: The model ID to use for completions. Defaults to "ibm/granite-13b-chat-v2". Available models can be found in your IBM Cloud account.project_id
: IBM Cloud project IDapi_base_url
: Custom base URL for the API endpoint. Defaults to "https://us-south.ml.cloud.ibm.com".system_prompt
: The system prompt to use for text generation.generation_kwargs
: Additional parameters to control text generation. These parameters are passed directly to the watsonx.ai inference endpoint. Supported parameters include:temperature
: Controls randomness (lower = more deterministic)max_new_tokens
: Maximum number of tokens to generatemin_new_tokens
: Minimum number of tokens to generatetop_p
: Nucleus sampling probability thresholdtop_k
: Number of highest probability tokens to considerrepetition_penalty
: Penalty for repeated tokenslength_penalty
: Penalty based on output lengthstop_sequences
: List of sequences where generation should stoprandom_seed
: Seed for reproducible resultstimeout
: Timeout in seconds for API requests. Defaults to environment variableWATSONX_TIMEOUT
or 30 seconds.max_retries
: Maximum number of retry attempts for failed requests. Defaults to environment variableWATSONX_MAX_RETRIES
or 5.verify
: SSL verification setting. Can be:- True: Verify SSL certificates (default)
- False: Skip verification (insecure)
- Path to CA bundle for custom certificates
streaming_callback
: A callback function for streaming responses.
WatsonxGenerator.to_dict
def to_dict() -> dict[str, Any]
Serialize the component to a dictionary.
Returns:
The serialized component as a dictionary.
WatsonxGenerator.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "WatsonxGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
WatsonxGenerator.run
@component.output_types(replies=list[str], meta=list[dict[str, Any]])
def run(*,
prompt: str,
system_prompt: str | None = None,
streaming_callback: StreamingCallbackT | None = None,
generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any]
Generate text completions synchronously.
Arguments:
prompt
: The input prompt string for text generation.system_prompt
: An optional system prompt to provide context or instructions for the generation. If not provided, the system prompt set in the__init__
method will be used.streaming_callback
: A callback function that is called when a new token is received from the stream. If provided, this will override thestreaming_callback
set in the__init__
method.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. Supported parameters include temperature, max_new_tokens, top_p, etc.
Returns:
A dictionary with the following keys:
replies
: A list of generated text completions as strings.meta
: A list of metadata dictionaries containing information about each generation, including model name, finish reason, and token usage statistics.
WatsonxGenerator.run_async
@component.output_types(replies=list[str], meta=list[dict[str, Any]])
async def run_async(
*,
prompt: str,
system_prompt: str | None = None,
streaming_callback: StreamingCallbackT | None = None,
generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any]
Generate text completions asynchronously.
Arguments:
prompt
: The input prompt string for text generation.system_prompt
: An optional system prompt to provide context or instructions for the generation.streaming_callback
: A callback function that is called when a new token is received from the stream. If provided, this will override thestreaming_callback
set in the__init__
method.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method. Supported parameters include temperature, max_new_tokens, top_p, etc.
Returns:
A dictionary with the following keys:
replies
: A list of generated text completions as strings.meta
: A list of metadata dictionaries containing information about each generation, including model name, finish reason, and token usage statistics.
Module haystack_integrations.components.generators.watsonx.chat.chat_generator
WatsonxChatGenerator
Enables chat completions using IBM's watsonx.ai foundation models.
This component interacts with IBM's watsonx.ai platform to generate chat responses using various foundation models. It supports the ChatMessage format for both input and output.
The generator works with IBM's foundation models including:
- granite-13b-chat-v2
- llama-2-70b-chat
- llama-3-70b-instruct
- Other watsonx.ai chat models
You can customize the generation behavior by passing parameters to the watsonx.ai API through the
generation_kwargs
argument. These parameters are passed directly to the watsonx.ai inference endpoint.
For details on watsonx.ai API parameters, see IBM watsonx.ai documentation.
Usage example
from haystack_integrations.components.generators.watsonx.chat.chat_generator import WatsonxChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
messages = [ChatMessage.from_user("Explain quantum computing in simple terms")]
client = WatsonxChatGenerator(
api_key=Secret.from_env_var("WATSONX_API_KEY"),
model="ibm/granite-13b-chat-v2",
project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
response = client.run(messages)
print(response)
Output:
{'replies':
[ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
[TextContent(text="Quantum computing uses quantum-mechanical phenomena like ....")],
_name=None,
_meta={'model': 'ibm/granite-13b-chat-v2', 'project_id': 'your-project-id',
'usage': {'prompt_tokens': 12, 'completion_tokens': 45, 'total_tokens': 57}})
]
}
WatsonxChatGenerator.__init__
def __init__(*,
api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
model: str = "ibm/granite-3-2b-instruct",
project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
api_base_url: str = "https://us-south.ml.cloud.ibm.com",
generation_kwargs: dict[str, Any] | None = None,
timeout: float | None = None,
max_retries: int | None = None,
verify: bool | str | None = None,
streaming_callback: StreamingCallbackT | None = None) -> None
Creates an instance of WatsonxChatGenerator.
Before initializing the component, you can set environment variables:
WATSONX_TIMEOUT
to override the default timeoutWATSONX_MAX_RETRIES
to override the default retry count
Arguments:
api_key
: IBM Cloud API key for watsonx.ai access. Can be set viaWATSONX_API_KEY
environment variable or passed directly.model
: The model ID to use for completions. Defaults to "ibm/granite-13b-chat-v2". Available models can be found in your IBM Cloud account.project_id
: IBM Cloud project IDapi_base_url
: Custom base URL for the API endpoint. Defaults to "https://us-south.ml.cloud.ibm.com".generation_kwargs
: Additional parameters to control text generation. These parameters are passed directly to the watsonx.ai inference endpoint. Supported parameters include:temperature
: Controls randomness (lower = more deterministic)max_new_tokens
: Maximum number of tokens to generatemin_new_tokens
: Minimum number of tokens to generatetop_p
: Nucleus sampling probability thresholdtop_k
: Number of highest probability tokens to considerrepetition_penalty
: Penalty for repeated tokenslength_penalty
: Penalty based on output lengthstop_sequences
: List of sequences where generation should stoprandom_seed
: Seed for reproducible resultstimeout
: Timeout in seconds for API requests. Defaults to environment variableWATSONX_TIMEOUT
or 30 seconds.max_retries
: Maximum number of retry attempts for failed requests. Defaults to environment variableWATSONX_MAX_RETRIES
or 5.verify
: SSL verification setting. Can be:- True: Verify SSL certificates (default)
- False: Skip verification (insecure)
- Path to CA bundle for custom certificates
streaming_callback
: A callback function for streaming responses.
WatsonxChatGenerator.to_dict
def to_dict() -> dict[str, Any]
Serialize the component to a dictionary.
Returns:
The serialized component as a dictionary.
WatsonxChatGenerator.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "WatsonxChatGenerator"
Deserialize this component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
WatsonxChatGenerator.run
@component.output_types(replies=list[ChatMessage])
def run(
*,
messages: list[ChatMessage],
generation_kwargs: dict[str, Any] | None = None,
streaming_callback: StreamingCallbackT | None = None
) -> dict[str, list[ChatMessage]]
Generate chat completions synchronously.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method.streaming_callback
: A callback function that is called when a new token is received from the stream. If provided this will override thestreaming_callback
set in the__init__
method.
Returns:
A dictionary with the following key:
replies
: A list containing the generated responses as ChatMessage instances.
WatsonxChatGenerator.run_async
@component.output_types(replies=list[ChatMessage])
async def run_async(
*,
messages: list[ChatMessage],
generation_kwargs: dict[str, Any] | None = None,
streaming_callback: StreamingCallbackT | None = None
) -> dict[str, list[ChatMessage]]
Generate chat completions asynchronously.
Arguments:
messages
: A list of ChatMessage instances representing the input messages.generation_kwargs
: Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the__init__
method.streaming_callback
: A callback function that is called when a new token is received from the stream. If provided this will override thestreaming_callback
set in the__init__
method.
Returns:
A dictionary with the following key:
replies
: A list containing the generated responses as ChatMessage instances.
Module haystack_integrations.components.embedders.watsonx.document_embedder
WatsonxDocumentEmbedder
Computes document embeddings using IBM watsonx.ai models.
Usage example
from haystack import Document
from haystack_integrations.components.embedders.watsonx.document_embedder import WatsonxDocumentEmbedder
documents = [
Document(content="I love pizza!"),
Document(content="Pasta is great too"),
]
document_embedder = WatsonxDocumentEmbedder(
model="ibm/slate-30m-english-rtrvr",
api_key=Secret.from_env_var("WATSONX_API_KEY"),
api_base_url="https://us-south.ml.cloud.ibm.com",
project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
result = document_embedder.run(documents=documents)
print(result["documents"][0].embedding)
# [0.017020374536514282, -0.023255806416273117, ...]
WatsonxDocumentEmbedder.__init__
def __init__(*,
model: str = "ibm/slate-30m-english-rtrvr",
api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
api_base_url: str = "https://us-south.ml.cloud.ibm.com",
project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
truncate_input_tokens: int | None = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 1000,
concurrency_limit: int = 5,
timeout: float | None = None,
max_retries: int | None = None,
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n")
Creates a WatsonxDocumentEmbedder component.
Arguments:
model
: The name of the model to use for calculating embeddings. Default is "ibm/slate-30m-english-rtrvr".api_key
: The WATSONX API key. Can be set via environment variable WATSONX_API_KEY.api_base_url
: The WATSONX URL for the watsonx.ai service. Default is "https://us-south.ml.cloud.ibm.com".project_id
: The ID of the Watson Studio project. Can be set via environment variable WATSONX_PROJECT_ID.truncate_input_tokens
: Maximum number of tokens to use from the input text. If set toNone
(or not provided), the full input text is used, up to the model's maximum token limit.prefix
: A string to add at the beginning of each text.suffix
: A string to add at the end of each text.batch_size
: Number of documents to embed in one API call. Default is 1000.concurrency_limit
: Number of parallel requests to make. Default is 5.timeout
: Timeout for API requests in seconds.max_retries
: Maximum number of retries for API requests.
WatsonxDocumentEmbedder.to_dict
def to_dict() -> dict[str, Any]
Serialize the component to a dictionary.
Returns:
The serialized component as a dictionary.
WatsonxDocumentEmbedder.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "WatsonxDocumentEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
WatsonxDocumentEmbedder.run
@component.output_types(documents=list[Document], meta=dict[str, Any])
def run(documents: list[Document]
) -> dict[str, list[Document] | dict[str, Any]]
Embeds a list of documents.
Arguments:
documents
: A list of documents to embed.
Returns:
A dictionary with:
- 'documents': List of Documents with embeddings added
- 'meta': Information about the model usage
Module haystack_integrations.components.embedders.watsonx.text_embedder
WatsonxTextEmbedder
Embeds strings using IBM watsonx.ai foundation models.
You can use it to embed user query and send it to an embedding Retriever.
Usage example
from haystack_integrations.components.embedders.watsonx.text_embedder import WatsonxTextEmbedder
text_to_embed = "I love pizza!"
text_embedder = WatsonxTextEmbedder(
model="ibm/slate-30m-english-rtrvr",
api_key=Secret.from_env_var("WATSONX_API_KEY"),
api_base_url="https://us-south.ml.cloud.ibm.com",
project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
print(text_embedder.run(text_to_embed))
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
# 'meta': {'model': 'ibm/slate-30m-english-rtrvr',
# 'truncated_input_tokens': 3}}
WatsonxTextEmbedder.__init__
def __init__(*,
model: str = "ibm/slate-30m-english-rtrvr",
api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
api_base_url: str = "https://us-south.ml.cloud.ibm.com",
project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
truncate_input_tokens: int | None = None,
prefix: str = "",
suffix: str = "",
timeout: float | None = None,
max_retries: int | None = None)
Creates an WatsonxTextEmbedder component.
Arguments:
model
: The name of the IBM watsonx model to use for calculating embeddings. Default is "ibm/slate-30m-english-rtrvr".api_key
: The WATSONX API key. Can be set via environment variable WATSONX_API_KEY.api_base_url
: The WATSONX URL for the watsonx.ai service. Default is "https://us-south.ml.cloud.ibm.com".project_id
: The ID of the Watson Studio project. Can be set via environment variable WATSONX_PROJECT_ID.truncate_input_tokens
: Maximum number of tokens to use from the input text. If set toNone
(or not provided), the full input text is used, up to the model's maximum token limit.prefix
: A string to add at the beginning of each text to embed.suffix
: A string to add at the end of each text to embed.timeout
: Timeout for API requests in seconds.max_retries
: Maximum number of retries for API requests.
WatsonxTextEmbedder.to_dict
def to_dict() -> dict[str, Any]
Serialize the component to a dictionary.
Returns:
The serialized component as a dictionary.
WatsonxTextEmbedder.from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "WatsonxTextEmbedder"
Deserializes the component from a dictionary.
Arguments:
data
: The dictionary representation of this component.
Returns:
The deserialized component instance.
WatsonxTextEmbedder.run
@component.output_types(embedding=list[float], meta=dict[str, Any])
def run(text: str) -> dict[str, list[float] | dict[str, Any]]
Embeds a single string.
Arguments:
text
: Text to embed.
Returns:
A dictionary with:
- 'embedding': The embedding of the input text
- 'meta': Information about the model usage