LangfuseConnector
Learn how to work with Langfuse in Haystack.
Most common position in a pipeline | Anywhere, as it’s not connected to other components |
Mandatory init variables | "name": The name of the pipeline or component to identify the tracing run |
Output variables | “name”: The name of the tracing component ”trace_url”: A link to the tracing data |
API reference | langfuse |
GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/langfuse |
Overview
LangfuseConnector
integrates tracing capabilities into Haystack pipelines using Langfuse. It captures detailed information about pipeline runs, like API calls, context data, prompts, and more. Use this component to:
- Monitor model performance, such as token usage and cost.
- Find areas for pipeline improvement by identifying low-quality outputs and collecting user feedback.
- Create datasets for fine-tuning and testing from your pipeline executions.
To work with the integration, add the LangfuseConnector
to your pipeline, run the pipeline, and then view the tracing data on the Langfuse website. Don’t connect this component to any other – LangfuseConnector
will simply run in your pipeline’s background.
Prerequisites
These are the things that you need before working with LangfuseConnector:
- Make sure you have an active Langfuse account.
- Set the
HAYSTACK_CONTENT_TRACING_ENABLED
environment variable totrue
– this will enable tracing in your pipelines. - Set the
LANGFUSE_SECRET_KEY
andLANGFUSE_PUBLIC_KEY
environment variables with your Langfuse secret and public keys found in your account profile.
Installation
First, install langfuse-haystack
package to use the LangfuseConnector
:
pip install langfuse-haystack
Usage Notice
To ensure proper tracing, always set environment variables before importing any Haystack components. This is crucial because Haystack initializes its internal tracing components during import. In the example below, we first set the environmental variables and then import the relevant Haystack components.
Alternatively, an even better practice is to set these environment variables in your shell before running the script. This approach keeps configuration separate from code and allows for easier management of different environments.
Usage
In the example below, we are adding LangfuseConnector
to the pipeline as a tracer. Each pipeline run will produce one trace that includes the entire execution context, including prompts, completions, and metadata.
You can then view the trace by following a URL link printed in the output.
import os
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
from haystack_integrations.components.connectors.langfuse import LangfuseConnector
if __name__ == "__main__":
pipe = Pipeline()
pipe.add_component("tracer", LangfuseConnector("Chat example"))
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-3.5-turbo"))
pipe.connect("prompt_builder.prompt", "llm.messages")
messages = [
ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}"),
]
response = pipe.run(
data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "prompt_source": messages}}
)
print(response["llm"]["replies"][0])
print(response["tracer"]["trace_url"])
Updated 5 months ago