Ragas integration for Haystack
Module haystack_integrations.components.evaluators.ragas.evaluator
RagasEvaluator
A component that uses the Ragas framework to evaluate inputs against specified Ragas metrics.
Usage example:
from haystack_integrations.components.evaluators.ragas import RagasEvaluator
from ragas.metrics import ContextPrecision
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
evaluator_llm = LangchainLLMWrapper(llm)
evaluator = RagasEvaluator(
ragas_metrics=[ContextPrecision()],
evaluator_llm=evaluator_llm
)
output = evaluator.run(
query="Which is the most popular global sport?",
documents=[
"Football is undoubtedly the world's most popular sport with"
" major events like the FIFA World Cup and sports personalities"
" like Ronaldo and Messi, drawing a followership of more than 4"
" billion people."
],
reference="Football is the most popular sport with around 4 billion"
" followers worldwide",
)
output['result']
RagasEvaluator.__init__
def __init__(ragas_metrics: List[Metric],
evaluator_llm: Optional[Union[BaseRagasLLM, LangchainLLM]] = None,
evaluator_embedding: Optional[Union[BaseRagasEmbeddings,
LangchainEmbeddings]] = None)
Constructs a new Ragas evaluator.
Arguments:
ragas_metrics
: A list of evaluation metrics from the Ragas library.evaluator_llm
: A language model used by metrics that require LLMs for evaluation.evaluator_embedding
: An embedding model used by metrics that require embeddings for evaluation.
RagasEvaluator.run
@component.output_types(result=EvaluationResult)
def run(query: Optional[str] = None,
response: Optional[Union[List[ChatMessage], str]] = None,
documents: Optional[List[Union[Document, str]]] = None,
reference_contexts: Optional[List[str]] = None,
multi_responses: Optional[List[str]] = None,
reference: Optional[str] = None,
rubrics: Optional[Dict[str, str]] = None) -> Dict[str, Any]
Evaluates the provided query against the documents and returns the evaluation result.
Arguments:
query
: The input query from the user.response
: A list of ChatMessage responses (typically from a language model or agent).documents
: A list of Haystack Document or strings that were retrieved for the query.reference_contexts
: A list of reference contexts that should have been retrieved for the query.multi_responses
: List of multiple responses generated for the query.reference
: A string reference answer for the query.rubrics
: A dictionary of evaluation rubric, where keys represent the score and the values represent the corresponding evaluation criteria.
Returns:
A dictionary containing the evaluation result.