AnswerExactMatchEvaluator
The AnswerExactMatchEvaluator
evaluates answers predicted by Haystack pipelines using ground truth labels. It checks character by character whether a predicted answer exactly matches the ground truth answer. This metric is called the exact match.
Most common position in a pipeline | On its own or in an evaluation pipeline. To be used after a separate pipeline that has generated the inputs for the Evaluator. |
Mandatory run variables | "ground_truth_answers": A list of strings containing the ground truth answers "predicted_answers": A list of strings containing the predicted answers to be evaluated |
Output variables | A dictionary containing: - score : A number from 0.0 to 1.0 representing the proportion of questions in which any predicted answer matched the ground truth answers- individual_scores : A list of 0s and 1s, where 1 means that the predicted answer matched one of the ground truths |
API reference | Evaluators |
GitHub link | https://github.com/deepset-ai/haystack/blob/main/haystack/components/evaluators/answer_exact_match.py |
Overview
You can use the AnswerExactMatchEvaluator
component to evaluate answers predicted by a Haystack pipeline, such as an extractive question answering pipeline, against ground truth labels. As the AnswerExactMatchEvaluator
checks whether a predicted answer exactly matches the ground truth answer. It is not suited to evaluate answers generated by LLMs, for example, in a RAG pipeline. Use FaithfulnessEvaluator
or SASEvaluator
instead.
To initialize an AnswerExactMatchEvaluator
, there are no parameters required.
Note that only one predicted answer is compared to one ground truth answer at a time. The component does not support multiple ground truth answers for the same question or multiple answers predicted for the same question.
Usage
On its own
Below is an example of using an AnswerExactMatchEvaluator
component to evaluate two answers and compare them to ground truth answers.
from haystack.components.evaluators import AnswerExactMatchEvaluator
evaluator = AnswerExactMatchEvaluator()
result = evaluator.run(
ground_truth_answers=["Berlin", "Paris"],
predicted_answers=["Berlin", "Lyon"],
)
print(result["individual_scores"])
# [1, 0]
print(result["score"])
# 0.5
In a pipeline
Below is an example where we use an AnswerExactMatchEvaluator
and a SASEvaluator
in a pipeline to evaluate two answers and compare them to ground truth answers. Running a pipeline instead of the individual components simplifies calculating more than one metric.
from haystack import Pipeline
from haystack.components.evaluators import AnswerExactMatchEvaluator
from haystack.components.evaluators import SASEvaluator
pipeline = Pipeline()
em_evaluator = AnswerExactMatchEvaluator()
sas_evaluator = SASEvaluator()
pipeline.add_component("em_evaluator", em_evaluator)
pipeline.add_component("sas_evaluator", sas_evaluator)
ground_truth_answers = ["Berlin", "Paris"]
predicted_answers = ["Berlin", "Lyon"]
result = pipeline.run(
{
"em_evaluator": {"ground_truth_answers": ground_truth_answers,
"predicted_answers": predicted_answers},
"sas_evaluator": {"ground_truth_answers": ground_truth_answers,
"predicted_answers": predicted_answers}
}
)
for evaluator in result:
print(result[evaluator]["individual_scores"])
# [1, 0]
# [array([[0.99999994]], dtype=float32), array([[0.51747656]], dtype=float32)]
for evaluator in result:
print(result[evaluator]["score"])
# 0.5
# 0.7587383
Updated 5 months ago