Ragas
haystack_integrations.components.evaluators.ragas.evaluator
RagasEvaluator
A component that uses the Ragas framework to evaluate inputs against specified Ragas metrics.
See the Ragas framework for more details.
This component supports the modern Ragas metrics API (ragas.metrics.collections).
Each metric must be a SimpleBaseMetric instance with its LLM configured at construction time.
Usage example:
from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import Faithfulness
from haystack_integrations.components.evaluators.ragas import RagasEvaluator
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
evaluator = RagasEvaluator(
ragas_metrics=[Faithfulness(llm=llm)],
)
output = evaluator.run(
query="Which is the most popular global sport?",
documents=[
"Football is undoubtedly the world's most popular sport with"
" major events like the FIFA World Cup and sports personalities"
" like Ronaldo and Messi, drawing a followership of more than 4"
" billion people."
],
reference="Football is the most popular sport with around 4 billion"
" followers worldwide",
)
output['result']
init
Constructs a new Ragas evaluator.
Parameters:
- ragas_metrics (
list[SimpleBaseMetric]) – A list of modern Ragas metrics fromragas.metrics.collections. Each metric must be fully configured (including its LLM) at construction time. Available metrics can be found in the Ragas documentation.
to_dict
Serialize this component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserialize this component from a dictionary.
Metrics are reconstructed from their stored class path and LLM/embedding
configuration. Only the openai provider is supported for automatic
deserialization; the API key is read from the OPENAI_API_KEY environment
variable at load time.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
RagasEvaluator– Deserialized component.
run
run(
query: str | None = None,
response: list[ChatMessage] | str | None = None,
documents: list[Document | str] | None = None,
reference_contexts: list[str] | None = None,
multi_responses: list[str] | None = None,
reference: str | None = None,
rubrics: dict[str, str] | None = None,
) -> dict[str, dict[str, MetricResult]]
Evaluates the provided inputs against each metric and returns the results.
Parameters:
- query (
str | None) – The input query from the user. - response (
list[ChatMessage] | str | None) – A list of ChatMessage responses (typically from a language model or agent). - documents (
list[Document | str] | None) – A list of Haystack Document or strings that were retrieved for the query. - reference_contexts (
list[str] | None) – A list of reference contexts that should have been retrieved for the query. - multi_responses (
list[str] | None) – List of multiple responses generated for the query. - reference (
str | None) – A string reference answer for the query. - rubrics (
dict[str, str] | None) – A dictionary of evaluation rubric, where keys represent the score and the values represent the corresponding evaluation criteria.
Returns:
dict[str, dict[str, MetricResult]]– A dictionary with keyresultmapping metric names to theirMetricResult.