Evaluation harness for Haystack.
Module haystack_experimental.evaluation.harness.evaluation_harness
EvaluationRunOverrides
Overrides for an evaluation run.
Used to override the init parameters of components in either
(or both) the evaluated and evaluation pipelines. Each key is
a component name and its value a dictionary with init parameters
to override.
Arguments:
evaluated_pipeline_overrides: Overrides for the evaluated pipeline.evaluation_pipeline_overrides: Overrides for the evaluation pipeline.
EvaluationHarness
Executes a pipeline with a given set of parameters, inputs and evaluates its outputs with an evaluation pipeline.
EvaluationHarness.run
@abstractmethod
def run(inputs: EvalRunInputT,
*,
overrides: Optional[EvalRunOverridesT] = None,
run_name: Optional[str] = None) -> EvalRunOutputTLaunch a evaluation run.
Arguments:
inputs: Inputs to the evaluated and evaluation pipelines.overrides: Overrides for the harness.run_name: A name for the evaluation run.
Returns:
The output of the evaluation pipeline.
Module haystack_experimental.evaluation.harness.rag.harness
DefaultRAGArchitecture
Represents default RAG pipeline architectures that can be used with the evaluation harness.
EMBEDDING_RETRIEVAL
A RAG pipeline with:
- A query embedder component named 'query_embedder' with a 'text' input.
- A document retriever component named 'retriever' with a 'documents' output.
KEYWORD_RETRIEVAL
A RAG pipeline with:
- A document retriever component named 'retriever' with a 'query' input and a 'documents' output.
GENERATION_WITH_EMBEDDING_RETRIEVAL
A RAG pipeline with:
- A query embedder component named 'query_embedder' with a 'text' input.
- A document retriever component named 'retriever' with a 'documents' output.
- A response generator component named 'generator' with a 'replies' output.
GENERATION_WITH_KEYWORD_RETRIEVAL
A RAG pipeline with:
- A document retriever component named 'retriever' with a 'query' input and a 'documents' output.
- A response generator component named 'generator' with a 'replies' output.
DefaultRAGArchitecture.expected_components
@property
def expected_components(
) -> Dict[RAGExpectedComponent, RAGExpectedComponentMetadata]Returns the expected components for the architecture.
Returns:
The expected components.
RAGEvaluationHarness
Evaluation harness for evaluating RAG pipelines.
RAGEvaluationHarness.__init__
def __init__(rag_pipeline: Pipeline,
rag_components: Union[
DefaultRAGArchitecture,
Dict[RAGExpectedComponent, RAGExpectedComponentMetadata],
],
metrics: Set[RAGEvaluationMetric],
*,
progress_bar: bool = True)Create an evaluation harness for evaluating basic RAG pipelines.
Arguments:
rag_pipeline: The RAG pipeline to evaluate.rag_components: Either a default RAG architecture or a mapping
of expected components to their metadata.metrics: The metrics to use during evaluation.progress_bar: Whether to display a progress bar during evaluation.
Module haystack_experimental.evaluation.harness.rag.parameters
RAGExpectedComponent
Represents the basic components in a RAG pipeline that are, by default, required to be present for evaluation.
Each of these can be separate components in the pipeline or a single component that performs
multiple tasks.
QUERY_PROCESSOR
The component in a RAG pipeline that accepts the user query.
Expected inputs: query - Name of input that contains the query string.
DOCUMENT_RETRIEVER
The component in a RAG pipeline that retrieves documents based on the query.
Expected outputs: retrieved_documents - Name of output containing retrieved documents.
RESPONSE_GENERATOR
The component in a RAG pipeline that generates responses based on the query and the retrieved documents.
Can be optional if the harness is only evaluating retrieval.
Expected outputs: replies - Name of out containing the LLM responses. Only the first response is used.
RAGExpectedComponentMetadata
Metadata for a RAGExpectedComponent.
Arguments:
name: Name of the component in the pipeline.input_mapping: Mapping of the expected inputs to
corresponding component input names.output_mapping: Mapping of the expected outputs to
corresponding component output names.
RAGEvaluationMetric
Represents the metrics that can be used to evaluate a RAG pipeline.
DOCUMENT_MAP
Document Mean Average Precision.
Required RAG components: Query Processor, Document Retriever.
DOCUMENT_MRR
Document Mean Reciprocal Rank.
Required RAG components: Query Processor, Document Retriever.
DOCUMENT_RECALL_SINGLE_HIT
Document Recall with a single hit.
Required RAG components: Query Processor, Document Retriever.
DOCUMENT_RECALL_MULTI_HIT
Document Recall with multiple hits.
Required RAG components: Query Processor, Document Retriever.
SEMANTIC_ANSWER_SIMILARITY
Semantic Answer Similarity.
Required RAG components: Query Processor, Response Generator.
FAITHFULNESS
Faithfulness.
Required RAG components: Query Processor, Document Retriever, Response Generator.
CONTEXT_RELEVANCE
Context Relevance.
Required RAG components: Query Processor, Document Retriever.
RAGEvaluationInput
Input passed to the RAG evaluation harness.
Arguments:
queries: The queries passed to the RAG pipeline.ground_truth_documents: The ground truth documents passed to the
evaluation pipeline. Only required for metrics
that require them. Corresponds to the queries.ground_truth_answers: The ground truth answers passed to the
evaluation pipeline. Only required for metrics
that require them. Corresponds to the queries.rag_pipeline_inputs: Additional inputs to pass to the RAG pipeline. Each
key is the name of the component and its value a dictionary
with the input name and a list of values, each corresponding
to a query.
RAGEvaluationOverrides
Overrides for a RAG evaluation run.
Used to override the init parameters of components in
either (or both) the evaluated and evaluation pipelines.
Arguments:
rag_pipeline: Overrides for the RAG pipeline. Each
key is a component name and its value a dictionary
with init parameters to override.eval_pipeline: Overrides for the evaluation pipeline. Each
key is a RAG metric and its value a dictionary
with init parameters to override.
RAGEvaluationOutput
Represents the output of a RAG evaluation run.
Arguments:
evaluated_pipeline: Serialized version of the evaluated pipeline, including overrides.evaluation_pipeline: Serialized version of the evaluation pipeline, including overrides.inputs: Input passed to the evaluation harness.results: Results of the evaluation run.
