Version: 2.26-unstable

Evaluation

eval_run_result

EvaluationRunResult

Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them.

init

python

__init__(
    run_name: str,
    inputs: dict[str, list[Any]],
    results: dict[str, dict[str, Any]],
)

Initialize a new evaluation run result.

Parameters:

run_name (str) – Name of the evaluation run.
inputs (dict[str, list[Any]]) – Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list of input values. The length of the lists should be the same.
results (dict[str, dict[str, Any]]) – Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name of the metric and its value is dictionary with the following keys:
- 'score': The aggregated score for the metric.
- 'individual_scores': A list of scores for each input sample.

aggregated_report

python

aggregated_report(
    output_format: Literal["json", "csv", "df"] = "json",
    csv_file: str | None = None,
) -> Union[dict[str, list[Any]], DataFrame, str]

Generates a report with aggregated scores for each metric.

Parameters:

output_format (Literal['json', 'csv', 'df']) – The output format for the report, "json", "csv", or "df", default to "json".
csv_file (str | None) – Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

Union[dict[str, list[Any]], DataFrame, str] – JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

detailed_report

python

detailed_report(
    output_format: Literal["json", "csv", "df"] = "json",
    csv_file: str | None = None,
) -> Union[dict[str, list[Any]], DataFrame, str]

Generates a report with detailed scores for each metric.

Parameters:

output_format (Literal['json', 'csv', 'df']) – The output format for the report, "json", "csv", or "df", default to "json".
csv_file (str | None) – Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

Union[dict[str, list[Any]], DataFrame, str] – JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

comparative_detailed_report

python

comparative_detailed_report(
    other: EvaluationRunResult,
    keep_columns: list[str] | None = None,
    output_format: Literal["json", "csv", "df"] = "json",
    csv_file: str | None = None,
) -> Union[str, DataFrame, None]

Generates a report with detailed scores for each metric from two evaluation runs for comparison.

Parameters:

other (EvaluationRunResult) – Results of another evaluation run to compare with.
keep_columns (list[str] | None) – List of common column names to keep from the inputs of the evaluation runs to compare.
output_format (Literal['json', 'csv', 'df']) – The output format for the report, "json", "csv", or "df", default to "json".
csv_file (str | None) – Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

Union[str, DataFrame, None] – JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

Raises:

TypeError – If other is not an EvaluationRunResult instance, or if the detailed reports are not dictionaries.
ValueError – If the other parameter is missing required attributes.

eval_run_result​

EvaluationRunResult​

init​

aggregated_report​

detailed_report​

comparative_detailed_report​

eval_run_result

EvaluationRunResult

init

aggregated_report

detailed_report

comparative_detailed_report