Abstract class for Retrievers.
Module base
BaseRetriever
class BaseRetriever(BaseComponent)
Base class for regular retrievers.
BaseRetriever.retrieve
@abstractmethod
def retrieve(
query: str,
filters: Optional[FilterType] = None,
top_k: Optional[int] = None,
index: Optional[str] = None,
headers: Optional[Dict[str, str]] = None,
scale_score: Optional[bool] = None,
document_store: Optional[BaseDocumentStore] = None) -> List[Document]
Scan through documents in DocumentStore and return a small number documents
that are most relevant to the query.
Arguments:
query
: The queryfilters
: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that fieldtop_k
: How many documents to return per query.index
: The name of the index in the DocumentStore from which to retrieve documentsheaders
: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication)scale_score
: Whether to scale the similarity score to the unit interval (range of [0,1]). If true (default) similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant. Otherwise raw similarity scores (e.g. cosine or dot_product) will be used.document_store
: the docstore to use for retrieval. IfNone
, the one given in the init is used instead.
BaseRetriever.timing
def timing(fn, attr_name)
Wrapper method used to time functions.
BaseRetriever.eval
def eval(label_index: str = "label",
doc_index: str = "eval_document",
label_origin: str = "gold-label",
top_k: int = 10,
open_domain: bool = False,
return_preds: bool = False,
headers: Optional[Dict[str, str]] = None,
document_store: Optional[BaseDocumentStore] = None) -> dict
Performs evaluation on the Retriever.
Retriever is evaluated based on whether it finds the correct document given the query string and at which position in the ranking of documents the correct document is.
Returns a dict containing the following metrics:
- "recall": Proportion of questions for which correct document is among retrieved documents
- "mrr": Mean of reciprocal rank. Rewards retrievers that give relevant documents a higher rank.
Only considers the highest ranked relevant document.
- "map": Mean of average precision for each question. Rewards retrievers that give relevant
documents a higher rank. Considers all retrieved relevant documents. If ``open_domain=True``,
average precision is normalized by the number of retrieved relevant documents per query.
If ``open_domain=False``, average precision is normalized by the number of all relevant documents
per query.
Arguments:
label_index
: Index/Table in DocumentStore where labeled questions are storeddoc_index
: Index/Table in DocumentStore where documents that are used for evaluation are storedtop_k
: How many documents to return per queryopen_domain
: IfTrue
, retrieval will be evaluated by checking if the answer string to a question is contained in the retrieved docs (common approach in open-domain QA). IfFalse
, retrieval uses a stricter evaluation that checks if the retrieved document ids are within ids explicitly stated in the labels.return_preds
: Whether to add predictions in the returned dictionary. If True, the returned dictionary contains the keys "predictions" and "metrics".headers
: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication)
BaseRetriever.run
def run(root_node: str,
query: Optional[str] = None,
filters: Optional[FilterType] = None,
top_k: Optional[int] = None,
documents: Optional[List[Document]] = None,
index: Optional[str] = None,
headers: Optional[Dict[str, str]] = None,
scale_score: Optional[bool] = None)
Arguments:
root_node
: The root node of the pipeline's graph.query
: Query string.filters
: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field.top_k
: How many documents to return per query.documents
: List of Documents to Retrieve.index
: The name of the index in the DocumentStore from which to retrieve documents.headers
: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication).scale_score
: Whether to scale the similarity score to the unit interval (range of [0,1]). If true (default), similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant. Otherwise, raw similarity scores (e.g. cosine or dot_product) will be used.
BaseRetriever.run_batch
def run_batch(root_node: str,
queries: Optional[List[str]] = None,
filters: Optional[Union[FilterType,
List[Optional[FilterType]]]] = None,
top_k: Optional[int] = None,
documents: Optional[Union[List[Document],
List[List[Document]]]] = None,
index: Optional[str] = None,
headers: Optional[Dict[str, str]] = None)
Arguments:
root_node
: The root node of the pipeline's graph.queries
: The list of query strings.filters
: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field.top_k
: How many documents to return per query.documents
: List of Documents of Retrieve.index
: The name of the index in the DocumentStore from which to retrieve documents.headers
: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication).