DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Ragas integration for Haystack

Module haystack_integrations.components.evaluators.ragas.evaluator

RagasEvaluator

A component that uses the Ragas framework to evaluate inputs against a specific metric. Supported metrics are defined by RagasMetric.

Usage example:

from haystack_integrations.components.evaluators.ragas import RagasEvaluator, RagasMetric

evaluator = RagasEvaluator(
    metric=RagasMetric.CONTEXT_PRECISION,
)
output = evaluator.run(
    questions=["Which is the most popular global sport?"],
    contexts=[
        [
            "Football is undoubtedly the world's most popular sport with"
            "major events like the FIFA World Cup and sports personalities"
            "like Ronaldo and Messi, drawing a followership of more than 4"
            "billion people."
        ]
    ],
    ground_truths=["Football is the most popular sport with around 4 billion" "followers worldwide"],
)
print(output["results"])

RagasEvaluator.__init__

def __init__(metric: Union[str, RagasMetric],
             metric_params: Optional[Dict[str, Any]] = None)

Construct a new Ragas evaluator.

Arguments:

  • metric: The metric to use for evaluation.
  • metric_params: Parameters to pass to the metric's constructor. Refer to the RagasMetric class for more details on required parameters.

RagasEvaluator.run

@component.output_types(results=List[List[Dict[str, Any]]])
def run(**inputs) -> Dict[str, Any]

Run the Ragas evaluator on the provided inputs.

Arguments:

  • inputs: The inputs to evaluate. These are determined by the metric being calculated. See RagasMetric for more information.

Returns:

A dictionary with a single results entry that contains a nested list of metric results. Each input can have one or more results, depending on the metric. Each result is a dictionary containing the following keys and values:

  • name - The name of the metric.
  • score - The score of the metric.

RagasEvaluator.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Raises:

  • DeserializationError: If the component cannot be serialized.

Returns:

Dictionary with serialized data.

RagasEvaluator.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "RagasEvaluator"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

Module haystack_integrations.components.evaluators.ragas.metrics

RagasBaseEnum

Base functionality for a Ragas enum.

RagasBaseEnum.from_str

@classmethod
def from_str(cls, string: str) -> "RagasMetric"

Create a metric type from a string.

Arguments:

  • string: The string to convert.

Returns:

The metric.

RagasMetric

Metrics supported by Ragas.

ANSWER_CORRECTNESS

Answer correctness.
Inputs - questions: List[str], responses: List[str], ground_truths: List[str]
Parameters - weights: Tuple[float, float]

FAITHFULNESS

Faithfulness.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]

ANSWER_SIMILARITY

Answer similarity.
Inputs - responses: List[str], ground_truths: List[str]
Parameters - threshold: float

CONTEXT_PRECISION

Context precision.
Inputs - questions: List[str], contexts: List[List[str]], ground_truths: List[str]

CONTEXT_UTILIZATION

Context utilization. Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]\

CONTEXT_RECALL

Context recall. Inputs - questions: List[str], contexts: List[List[str]], ground_truths: List[str]\

ASPECT_CRITIQUE

Aspect critique. Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]
Parameters - name: str, definition: str, strictness: int

ANSWER_RELEVANCY

Answer relevancy.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]
Parameters - strictness: int

MetricResult

Result of a metric evaluation.

Arguments:

  • name: The name of the metric.
  • score: The score of the metric.