API Reference

Represents the results of evaluation.

Module eval_run_result

EvaluationRunResult

Contains the inputs and the outputs of an evaluation pipeline and provides methods to inspect them.

EvaluationRunResult.__init__

def __init__(run_name: str, inputs: Dict[str, List[Any]],
             results: Dict[str, Dict[str, Any]])

Initialize a new evaluation run result.

Arguments:

  • run_name: Name of the evaluation run.
  • inputs: Dictionary containing the inputs used for the run. Each key is the name of the input and its value is a list of input values. The length of the lists should be the same.
  • results: Dictionary containing the results of the evaluators used in the evaluation pipeline. Each key is the name of the metric and its value is dictionary with the following keys:
  • 'score': The aggregated score for the metric.
  • 'individual_scores': A list of scores for each input sample.

EvaluationRunResult.aggregated_report

def aggregated_report(
    output_format: Literal["json", "csv", "df"] = "json",
    csv_file: Optional[str] = None
) -> Union[Dict[str, List[Any]], "DataFrame", str]

Generates a report with aggregated scores for each metric.

Arguments:

  • output_format: The output format for the report, "json", "csv", or "df", default to "json".
  • csv_file: Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

JSON or DataFrame with aggregated scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

EvaluationRunResult.detailed_report

def detailed_report(
    output_format: Literal["json", "csv", "df"] = "json",
    csv_file: Optional[str] = None
) -> Union[Dict[str, List[Any]], "DataFrame", str]

Generates a report with detailed scores for each metric.

Arguments:

  • output_format: The output format for the report, "json", "csv", or "df", default to "json".
  • csv_file: Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

JSON or DataFrame with the detailed scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

EvaluationRunResult.comparative_detailed_report

def comparative_detailed_report(
        other: "EvaluationRunResult",
        keep_columns: Optional[List[str]] = None,
        output_format: Literal["json", "csv", "df"] = "json",
        csv_file: Optional[str] = None) -> Union[str, "DataFrame", None]

Generates a report with detailed scores for each metric from two evaluation runs for comparison.

Arguments:

  • other: Results of another evaluation run to compare with.
  • keep_columns: List of common column names to keep from the inputs of the evaluation runs to compare.
  • output_format: The output format for the report, "json", "csv", or "df", default to "json".
  • csv_file: Filepath to save CSV output if output_format is "csv", must be provided.

Returns:

JSON or DataFrame with a comparison of the detailed scores, in case the output is set to a CSV file, a message confirming the successful write or an error message.

EvaluationRunResult.score_report

def score_report() -> "DataFrame"

Generates a DataFrame report with aggregated scores for each metric.

EvaluationRunResult.to_pandas

def to_pandas() -> "DataFrame"

Generates a DataFrame report with detailed scores for each metric.

EvaluationRunResult.comparative_individual_scores_report

def comparative_individual_scores_report(
        other: "EvaluationRunResult") -> "DataFrame"

Generates a DataFrame report with detailed scores for each metric from two evaluation runs for comparison.