DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
Documentation

TogetherAIGenerator

This component enables text generation using models hosted on Together AI.

Most common position in a pipelineAfter a PromptBuilder
Mandatory init variables"api_key": A Together API key. Can be set with TOGETHER_API_KEY env var.
Mandatory run variables"prompt": A string containing the prompt for the LLM
Output variables"replies": A list of strings with all the replies generated by the LLM

"meta": A list of dictionaries with the metadata associated with each reply, such as token count, finish reason, and so on
API referenceTogetherAI
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/togetherai

Overview

TogetherAIGenerator supports models hosted on Together AI, such as meta-llama/Llama-3.3-70B-Instruct-Turbo. For the full list of supported models, see Together AI documentation.

This component needs a prompt string to operate. You can pass any text generation parameters valid for the Together AI chat completion API directly to this component using the generation_kwargs parameter in __init__ or the generation_kwargs parameter in run method. For more details on the parameters supported by the Together AI API, see Together AI API documentation.

You can also provide an optional system_prompt to set context or instructions for text generation. If not provided, the system prompt is omitted, and the default system prompt of the model is used.

To use this integration, you need to have an active TogetherAI subscription with sufficient credits and an API key. You can provide it with:

  • The TOGETHER_API_KEY environment variable (recommended)
  • The api_key init parameter and Haystack Secret API: Secret.from_token("your-api-key-here")

By default, the component uses Together AI's OpenAI-compatible base URL https://api.together.xyz/v1, which you can override with api_base_url if needed.

Streaming

TogetherAIGenerator supports streaming responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the streaming_callback parameter during initialization.

📘

This component is designed for text generation, not for chat. If you want to use Together AI LLMs for chat, use TogetherAIChatGenerator instead.

Usage

Install the togetherai-haystack package to use the TogetherAIGenerator:

pip install togetherai-haystack

On its own

Basic usage:

from haystack_integrations.components.generators.togetherai import TogetherAIGenerator

client = TogetherAIGenerator(model="meta-llama/Llama-3.3-70B-Instruct-Turbo")
response = client.run("What's Natural Language Processing? Be brief.")
print(response)

>> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence
>> that focuses on enabling computers to understand, interpret, and generate human language
>> in a way that is meaningful and useful.'],
>> 'meta': [{'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'index': 0,
>> 'finish_reason': 'stop', 'usage': {'prompt_tokens': 15, 'completion_tokens': 36,
>> 'total_tokens': 51}}]}

With streaming:

from haystack_integrations.components.generators.togetherai import TogetherAIGenerator

client = TogetherAIGenerator(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    streaming_callback=lambda chunk: print(chunk.content, end="", flush=True),
)

response = client.run("What's Natural Language Processing? Be brief.")
print(response)

With system prompt:

from haystack_integrations.components.generators.togetherai import TogetherAIGenerator

client = TogetherAIGenerator(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    system_prompt="You are a helpful assistant that provides concise answers."
)

response = client.run("What's Natural Language Processing?")
print(response["replies"][0])

In a Pipeline

from haystack import Pipeline, Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.togetherai import TogetherAIGenerator

docstore = InMemoryDocumentStore()
docstore.write_documents([
    Document(content="Rome is the capital of Italy"),
    Document(content="Paris is the capital of France")
])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""

pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", TogetherAIGenerator(model="meta-llama/Llama-3.3-70B-Instruct-Turbo"))

pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

result = pipe.run({
    "prompt_builder": {"query": query},
    "retriever": {"query": query}
})

print(result)

>> {'llm': {'replies': ['The capital of France is Paris.'],
>> 'meta': [{'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', ...}]}}