Module base

BaseRetriever

class BaseRetriever(BaseComponent)

Base class for regular retrievers.

BaseRetriever.retrieve

@abstractmethod
def retrieve(
        query: str,
        filters: Optional[FilterType] = None,
        top_k: Optional[int] = None,
        index: Optional[str] = None,
        headers: Optional[Dict[str, str]] = None,
        scale_score: Optional[bool] = None,
        document_store: Optional[BaseDocumentStore] = None) -> List[Document]

Scan through documents in DocumentStore and return a small number documents

that are most relevant to the query.

Arguments:

query: The query
filters: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field
top_k: How many documents to return per query.
index: The name of the index in the DocumentStore from which to retrieve documents
headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication)
scale_score: Whether to scale the similarity score to the unit interval (range of [0,1]). If true (default) similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant. Otherwise raw similarity scores (e.g. cosine or dot_product) will be used.
document_store: the docstore to use for retrieval. If None, the one given in the init is used instead.

BaseRetriever.timing

def timing(fn, attr_name)

Wrapper method used to time functions.

BaseRetriever.eval

def eval(label_index: str = "label",
         doc_index: str = "eval_document",
         label_origin: str = "gold-label",
         top_k: int = 10,
         open_domain: bool = False,
         return_preds: bool = False,
         headers: Optional[Dict[str, str]] = None,
         document_store: Optional[BaseDocumentStore] = None) -> dict

Performs evaluation on the Retriever.

Retriever is evaluated based on whether it finds the correct document given the query string and at which position in the ranking of documents the correct document is.

Returns a dict containing the following metrics:

- "recall": Proportion of questions for which correct document is among retrieved documents
- "mrr": Mean of reciprocal rank. Rewards retrievers that give relevant documents a higher rank.
  Only considers the highest ranked relevant document.
- "map": Mean of average precision for each question. Rewards retrievers that give relevant
  documents a higher rank. Considers all retrieved relevant documents. If ``open_domain=True``,
  average precision is normalized by the number of retrieved relevant documents per query.
  If ``open_domain=False``, average precision is normalized by the number of all relevant documents
  per query.

Arguments:

label_index: Index/Table in DocumentStore where labeled questions are stored
doc_index: Index/Table in DocumentStore where documents that are used for evaluation are stored
top_k: How many documents to return per query
open_domain: If True, retrieval will be evaluated by checking if the answer string to a question is contained in the retrieved docs (common approach in open-domain QA). If False, retrieval uses a stricter evaluation that checks if the retrieved document ids are within ids explicitly stated in the labels.
return_preds: Whether to add predictions in the returned dictionary. If True, the returned dictionary contains the keys "predictions" and "metrics".
headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication)

BaseRetriever.run

def run(root_node: str,
        query: Optional[str] = None,
        filters: Optional[FilterType] = None,
        top_k: Optional[int] = None,
        documents: Optional[List[Document]] = None,
        index: Optional[str] = None,
        headers: Optional[Dict[str, str]] = None,
        scale_score: Optional[bool] = None)

Arguments:

root_node: The root node of the pipeline's graph.
query: Query string.
filters: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field.
top_k: How many documents to return per query.
documents: List of Documents to Retrieve.
index: The name of the index in the DocumentStore from which to retrieve documents.
headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication).
scale_score: Whether to scale the similarity score to the unit interval (range of [0,1]). If true (default), similarity scores (e.g. cosine or dot_product) which naturally have a different value range will be scaled to a range of [0,1], where 1 means extremely relevant. Otherwise, raw similarity scores (e.g. cosine or dot_product) will be used.

BaseRetriever.run_batch

def run_batch(root_node: str,
              queries: Optional[List[str]] = None,
              filters: Optional[Union[FilterType,
                                      List[Optional[FilterType]]]] = None,
              top_k: Optional[int] = None,
              documents: Optional[Union[List[Document],
                                        List[List[Document]]]] = None,
              index: Optional[str] = None,
              headers: Optional[Dict[str, str]] = None)

Arguments:

root_node: The root node of the pipeline's graph.
queries: The list of query strings.
filters: A dictionary where the keys specify a metadata field and the value is a list of accepted values for that field.
top_k: How many documents to return per query.
documents: List of Documents of Retrieve.
index: The name of the index in the DocumentStore from which to retrieve documents.
headers: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication).