Skip to main content
Version: 2.28

Ragas

haystack_integrations.components.evaluators.ragas.evaluator

RagasEvaluator

A component that uses the Ragas framework to evaluate inputs against specified Ragas metrics.

See the Ragas framework for more details.

This component supports the modern Ragas metrics API (ragas.metrics.collections). Each metric must be a SimpleBaseMetric instance with its LLM configured at construction time.

Usage example:

python
from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import Faithfulness
from haystack_integrations.components.evaluators.ragas import RagasEvaluator

client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

evaluator = RagasEvaluator(
ragas_metrics=[Faithfulness(llm=llm)],
)
output = evaluator.run(
query="Which is the most popular global sport?",
documents=[
"Football is undoubtedly the world's most popular sport with"
" major events like the FIFA World Cup and sports personalities"
" like Ronaldo and Messi, drawing a followership of more than 4"
" billion people."
],
reference="Football is the most popular sport with around 4 billion"
" followers worldwide",
)

output['result']

init

python
__init__(ragas_metrics: list[SimpleBaseMetric]) -> None

Constructs a new Ragas evaluator.

Parameters:

  • ragas_metrics (list[SimpleBaseMetric]) – A list of modern Ragas metrics from ragas.metrics.collections. Each metric must be fully configured (including its LLM) at construction time. Available metrics can be found in the Ragas documentation.

to_dict

python
to_dict() -> dict[str, Any]

Serialize this component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

python
from_dict(data: dict[str, Any]) -> RagasEvaluator

Deserialize this component from a dictionary.

Metrics are reconstructed from their stored class path and LLM/embedding configuration. Only the openai provider is supported for automatic deserialization; the API key is read from the OPENAI_API_KEY environment variable at load time.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • RagasEvaluator – Deserialized component.

run

python
run(
query: str | None = None,
response: list[ChatMessage] | str | None = None,
documents: list[Document | str] | None = None,
reference_contexts: list[str] | None = None,
multi_responses: list[str] | None = None,
reference: str | None = None,
rubrics: dict[str, str] | None = None,
) -> dict[str, dict[str, MetricResult]]

Evaluates the provided inputs against each metric and returns the results.

Parameters:

  • query (str | None) – The input query from the user.
  • response (list[ChatMessage] | str | None) – A list of ChatMessage responses (typically from a language model or agent).
  • documents (list[Document | str] | None) – A list of Haystack Document or strings that were retrieved for the query.
  • reference_contexts (list[str] | None) – A list of reference contexts that should have been retrieved for the query.
  • multi_responses (list[str] | None) – List of multiple responses generated for the query.
  • reference (str | None) – A string reference answer for the query.
  • rubrics (dict[str, str] | None) – A dictionary of evaluation rubric, where keys represent the score and the values represent the corresponding evaluation criteria.

Returns:

  • dict[str, dict[str, MetricResult]] – A dictionary with key result mapping metric names to their MetricResult.