LangfuseConnector
Learn how to work with Langfuse in Haystack.
Most common position in a pipeline | Anywhere, as it’s not connected to other components |
Mandatory init variables | "name": The name of the pipeline or component to identify the tracing run |
Output variables | “name”: The name of the tracing component ”trace_url”: A link to the tracing data |
API reference | langfuse |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/langfuse |
Overview
LangfuseConnector
integrates tracing capabilities into Haystack pipelines using Langfuse. It captures detailed information about pipeline runs, like API calls, context data, prompts, and more. Use this component to:
- Monitor model performance, such as token usage and cost.
- Find areas for pipeline improvement by identifying low-quality outputs and collecting user feedback.
- Create datasets for fine-tuning and testing from your pipeline executions.
To work with the integration, add the LangfuseConnector
to your pipeline, run the pipeline, and then view the tracing data on the Langfuse website. Don’t connect this component to any other – LangfuseConnector
will simply run in your pipeline’s background.
You can optionally define two more parameters when working with this component:
httpx_client
: An optional customhttpx.Client
instance for Langfuse API calls. Note that custom clients are discarded when deserializing a pipeline from YAML, as HTTPX clients cannot be serialized. In such cases, Langfuse creates a default client.span_handler
: An optional custom handler for processing spans. If not provided, theDefaultSpanHandler
is used. The span handler defines how spans are created and processed, enabling customization of span types based on component types and post-processing of spans. See more details in the Advanced Usage section below.
Prerequisites
These are the things that you need before working with LangfuseConnector:
- Make sure you have an active Langfuse account.
- Set the
HAYSTACK_CONTENT_TRACING_ENABLED
environment variable totrue
– this will enable tracing in your pipelines. - Set the
LANGFUSE_SECRET_KEY
andLANGFUSE_PUBLIC_KEY
environment variables with your Langfuse secret and public keys found in your account profile.
Installation
First, install langfuse-haystack
package to use the LangfuseConnector
:
pip install langfuse-haystack
Usage Notice
To ensure proper tracing, always set environment variables before importing any Haystack components. This is crucial because Haystack initializes its internal tracing components during import. In the example below, we first set the environmental variables and then import the relevant Haystack components.
Alternatively, an even better practice is to set these environment variables in your shell before running the script. This approach keeps configuration separate from code and allows for easier management of different environments.
Usage
In the example below, we are adding LangfuseConnector
to the pipeline as a tracer. Each pipeline run will produce one trace that includes the entire execution context, including prompts, completions, and metadata.
You can then view the trace by following a URL link printed in the output.
import os
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
from haystack_integrations.components.connectors.langfuse import LangfuseConnector
if __name__ == "__main__":
pipe = Pipeline()
pipe.add_component("tracer", LangfuseConnector("Chat example"))
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-3.5-turbo"))
pipe.connect("prompt_builder.prompt", "llm.messages")
messages = [
ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}"),
]
response = pipe.run(
data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "prompt_source": messages}}
)
print(response["llm"]["replies"][0])
print(response["tracer"]["trace_url"])
Advanced Usage
Customizing Langfuse Traces with SpanHandler
The SpanHandler
interface in Haystack allows you to customize how spans are created and processed for Langfuse trace creation. This enables you to log custom metrics, add tags, or integrate metadata.
By extending SpanHandler
or its default implementation, DefaultSpanHandler
, you can define custom logic for span processing, providing precise control over what data is logged to Langfuse for tracking and analyzing pipeline executions.
Here's an example:
from haystack_integrations.tracing.langfuse import LangfuseConnector, DefaultSpanHandler, LangfuseSpan
from typing import Optional
class CustomSpanHandler(DefaultSpanHandler):
def handle(self, span: LangfuseSpan, component_type: Optional[str]) -> None:
# Custom logic to add metadata or modify span
if component_type == "OpenAIChatGenerator":
output = span._data.get("haystack.component.output", {})
if len(output.get("text", "")) < 10:
span._span.update(level="WARNING", status_message="Response too short")
# Add the custom handler to the LangfuseConnector
connector = LangfuseConnector(span_handler=CustomSpanHandler())
Updated 24 days ago