DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio
API Reference

Evaluation harness for Haystack.

Module haystack_experimental.evaluation.harness.evaluation_harness

EvaluationRunOverrides

Overrides for an evaluation run.

Used to override the init parameters of components in either (or both) the evaluated and evaluation pipelines. Each key is a component name and its value a dictionary with init parameters to override.

Arguments:

  • evaluated_pipeline_overrides: Overrides for the evaluated pipeline.
  • evaluation_pipeline_overrides: Overrides for the evaluation pipeline.

EvaluationHarness

Executes a pipeline with a given set of parameters, inputs and evaluates its outputs with an evaluation pipeline.

EvaluationHarness.run

@abstractmethod
def run(inputs: EvalRunInputT,
        *,
        overrides: Optional[EvalRunOverridesT] = None,
        run_name: Optional[str] = None) -> EvalRunOutputT

Launch a evaluation run.

Arguments:

  • inputs: Inputs to the evaluated and evaluation pipelines.
  • overrides: Overrides for the harness.
  • run_name: A name for the evaluation run.

Returns:

The output of the evaluation pipeline.

Module haystack_experimental.evaluation.harness.rag.harness

DefaultRAGArchitecture

Represents default RAG pipeline architectures that can be used with the evaluation harness.

EMBEDDING_RETRIEVAL

A RAG pipeline with:

  • A query embedder component named 'query_embedder' with a 'text' input.
  • A document retriever component named 'retriever' with a 'documents' output.

KEYWORD_RETRIEVAL

A RAG pipeline with:

  • A document retriever component named 'retriever' with a 'query' input and a 'documents' output.

GENERATION_WITH_EMBEDDING_RETRIEVAL

A RAG pipeline with:

  • A query embedder component named 'query_embedder' with a 'text' input.
  • A document retriever component named 'retriever' with a 'documents' output.
  • A response generator component named 'generator' with a 'replies' output.

GENERATION_WITH_KEYWORD_RETRIEVAL

A RAG pipeline with:

  • A document retriever component named 'retriever' with a 'query' input and a 'documents' output.
  • A response generator component named 'generator' with a 'replies' output.

DefaultRAGArchitecture.expected_components

@property
def expected_components(
) -> Dict[RAGExpectedComponent, RAGExpectedComponentMetadata]

Returns the expected components for the architecture.

Returns:

The expected components.

RAGEvaluationHarness

Evaluation harness for evaluating RAG pipelines.

RAGEvaluationHarness.__init__

def __init__(rag_pipeline: Pipeline,
             rag_components: Union[
                 DefaultRAGArchitecture,
                 Dict[RAGExpectedComponent, RAGExpectedComponentMetadata],
             ],
             metrics: Set[RAGEvaluationMetric],
             *,
             progress_bar: bool = True)

Create an evaluation harness for evaluating basic RAG pipelines.

Arguments:

  • rag_pipeline: The RAG pipeline to evaluate.
  • rag_components: Either a default RAG architecture or a mapping of expected components to their metadata.
  • metrics: The metrics to use during evaluation.
  • progress_bar: Whether to display a progress bar during evaluation.

Module haystack_experimental.evaluation.harness.rag.parameters

RAGExpectedComponent

Represents the basic components in a RAG pipeline that are, by default, required to be present for evaluation.

Each of these can be separate components in the pipeline or a single component that performs multiple tasks.

QUERY_PROCESSOR

The component in a RAG pipeline that accepts the user query. Expected inputs: query - Name of input that contains the query string.

DOCUMENT_RETRIEVER

The component in a RAG pipeline that retrieves documents based on the query. Expected outputs: retrieved_documents - Name of output containing retrieved documents.

RESPONSE_GENERATOR

The component in a RAG pipeline that generates responses based on the query and the retrieved documents. Can be optional if the harness is only evaluating retrieval. Expected outputs: replies - Name of out containing the LLM responses. Only the first response is used.

RAGExpectedComponentMetadata

Metadata for a RAGExpectedComponent.

Arguments:

  • name: Name of the component in the pipeline.
  • input_mapping: Mapping of the expected inputs to corresponding component input names.
  • output_mapping: Mapping of the expected outputs to corresponding component output names.

RAGEvaluationMetric

Represents the metrics that can be used to evaluate a RAG pipeline.

DOCUMENT_MAP

Document Mean Average Precision. Required RAG components: Query Processor, Document Retriever.

DOCUMENT_MRR

Document Mean Reciprocal Rank. Required RAG components: Query Processor, Document Retriever.

DOCUMENT_RECALL_SINGLE_HIT

Document Recall with a single hit. Required RAG components: Query Processor, Document Retriever.

DOCUMENT_RECALL_MULTI_HIT

Document Recall with multiple hits. Required RAG components: Query Processor, Document Retriever.

SEMANTIC_ANSWER_SIMILARITY

Semantic Answer Similarity. Required RAG components: Query Processor, Response Generator.

FAITHFULNESS

Faithfulness. Required RAG components: Query Processor, Document Retriever, Response Generator.

CONTEXT_RELEVANCE

Context Relevance. Required RAG components: Query Processor, Document Retriever.

RAGEvaluationInput

Input passed to the RAG evaluation harness.

Arguments:

  • queries: The queries passed to the RAG pipeline.
  • ground_truth_documents: The ground truth documents passed to the evaluation pipeline. Only required for metrics that require them. Corresponds to the queries.
  • ground_truth_answers: The ground truth answers passed to the evaluation pipeline. Only required for metrics that require them. Corresponds to the queries.
  • rag_pipeline_inputs: Additional inputs to pass to the RAG pipeline. Each key is the name of the component and its value a dictionary with the input name and a list of values, each corresponding to a query.

RAGEvaluationOverrides

Overrides for a RAG evaluation run.

Used to override the init parameters of components in either (or both) the evaluated and evaluation pipelines.

Arguments:

  • rag_pipeline: Overrides for the RAG pipeline. Each key is a component name and its value a dictionary with init parameters to override.
  • eval_pipeline: Overrides for the evaluation pipeline. Each key is a RAG metric and its value a dictionary with init parameters to override.

RAGEvaluationOutput

Represents the output of a RAG evaluation run.

Arguments:

  • evaluated_pipeline: Serialized version of the evaluated pipeline, including overrides.
  • evaluation_pipeline: Serialized version of the evaluation pipeline, including overrides.
  • inputs: Input passed to the evaluation harness.
  • results: Results of the evaluation run.