DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

LangfuseConnector

Learn how to work with Langfuse in Haystack.

Most common position in a pipelineAnywhere, as it’s not connected to other components
Mandatory init variables"name": The name of the pipeline or component to identify the tracing run
Output variables“name”: The name of the tracing component

”trace_url”: A link to the tracing data
API referencelangfuse
GitHub linkhttps://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/langfuse

Overview

LangfuseConnector integrates tracing capabilities into Haystack pipelines using Langfuse. It captures detailed information about pipeline runs, like API calls, context data, prompts, and more. Use this component to:

  • Monitor model performance, such as token usage and cost.
  • Find areas for pipeline improvement by identifying low-quality outputs and collecting user feedback.
  • Create datasets for fine-tuning and testing from your pipeline executions.

To work with the integration, add the LangfuseConnector to your pipeline, run the pipeline, and then view the tracing data on the Langfuse website. Don’t connect this component to any other – LangfuseConnector will simply run in your pipeline’s background.

Prerequisites

These are the things that you need before working with LangfuseConnector:

  1. Make sure you have an active Langfuse account.
  2. Set the HAYSTACK_CONTENT_TRACING_ENABLED environment variable to true – this will enable tracing in your pipelines.
  3. Set the LANGFUSE_SECRET_KEY and LANGFUSE_PUBLIC_KEY environment variables with your Langfuse secret and public keys found in your account profile.

Installation

First, install langfuse-haystack package to use the LangfuseConnector:

pip install langfuse-haystack

📘

Usage Notice

To ensure proper tracing, always set environment variables before importing any Haystack components. This is crucial because Haystack initializes its internal tracing components during import. In the example below, we first set the environmental variables and then import the relevant Haystack components.

Alternatively, an even better practice is to set these environment variables in your shell before running the script. This approach keeps configuration separate from code and allows for easier management of different environments.

Usage

In the example below, we are adding LangfuseConnector to the pipeline as a tracer. Each pipeline run will produce one trace that includes the entire execution context, including prompts, completions, and metadata.

You can then view the trace by following a URL link printed in the output.

import os

os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"

from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

from haystack_integrations.components.connectors.langfuse import LangfuseConnector

if __name__ == "__main__":
    pipe = Pipeline()
    pipe.add_component("tracer", LangfuseConnector("Chat example"))
    pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
    pipe.add_component("llm", OpenAIChatGenerator(model="gpt-3.5-turbo"))

    pipe.connect("prompt_builder.prompt", "llm.messages")

    messages = [
        ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
        ChatMessage.from_user("Tell me about {{location}}"),
    ]

    response = pipe.run(
        data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "prompt_source": messages}}
    )
    print(response["llm"]["replies"][0])
    print(response["tracer"]["trace_url"])

Related Links

Check out the API reference in the GitHub repo: