HomeGuidesAPI ReferenceTutorials
Haystack

Reader

The Reader takes a question and a set of Documents as input and returns an Answer by selecting a text span within the Documents. Readers use models to perform QA. Learn about the available Readers and the recommended models.

The Reader is also known as an open-domain QA system in Machine Learning speak.

Position in a PipelineGenerally after a Retriever
InputDocuments
OutputAnswers
ClassesFARMReader
TransformersReader
TableReader

Pros

  • Built on the latest transformer based language models
  • Strong in their grasp of semantics
  • Sensitive to syntactic structure
  • State-of-the-art in QA tasks like SQuAD and Natural Questions

Cons

  • Requires a GPU to run quickly

Usage

To initialize a reader, run:

from haystack.nodes import FARMReader
model = "deepset/roberta-base-squad2"
reader = FARMReader(model, use_gpu=True)
from haystack.nodes import TransformersReader
model = "deepset/roberta-base-squad2"
reader = TransformersReader(model, use_gpu=1)

To run a reader on its own, use the predict() method:

result = reader.predict(
    query="Which country is Canberra located in?",
    documents=documents,
    top_k=10
)

This will return a dictionary of the following format:

{
    'query': 'Which country is Canberra located in?',
    'answers':[
                 {'answer': 'Australia',
                 'context': "Canberra, federal capital of the Commonwealth of Australia. It occupies part of the Australian Capital Territory (ACT),",
                 'offset_answer_start': 147,
                 'offset_answer_end': 154,
                 'score': 0.9787139466668613,
                 'document_id': '1337'
                 },...
              ],
}

If you want to set up Haystack as a service, use the Reader in a pipeline:

from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

prediction = pipe.run(
    query='Which country is Canberra located in?',
    params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 10}}
)

TableReader

With the TableReader, you can get answers to your questions even if the answer is buried in a table. It is designed to use the TAPAS model created by Google.

These models are able to return a single single cell as answer or pick a set of cells and then perform an aggregation operation to form a final answer. Have a look at our guide on Table Question Answering to find out more.

Models

The different versions of Reader models are referred to as models. Different models have different strengths and weaknesses. Larger models are generally more accurate but sacrifice some speed. Models trained on different data may be more suited to certain domains.

You will find many open source Reader models on the HuggingFace Model Hub. These models are directly usable in Haystack if you provide the Model Hub name to the Reader's initialization. Haystack will automatically handle the downloading and loading of the model.

If you're unsure which model to use, here are our recommendations that you can start with:

FARM

RoBERTa (base)
A well-rounded model and our recommended starting point.

from haystack.nodes import FARMReader
reader = FARMReader("deepset/roberta-base-squad2")

Pro: Strong all round model that performs capably on a Nvidia V100 GPU.

Con: There are other models that are either faster or more accurate

MiniLM

A cleverly distilled model that sacrifices a little accuracy for
speed.

from haystack.nodes import FARMReader
reader = FARMReader("deepset/minilm-uncased-squad2")

Pro: 40% smaller, 50% faster inference speed and better accuracy than BERT base

Con: Still doesn’t match the best base sized models in accuracy

ALBERT (XXL)

Large, powerful, SotA model.

from haystack.nodes import FARMReader
reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")

Pro: Better accuracy than any other open source model in QA

Con: The computational power needed make it impractical for most use cases

Transformers

RoBERTa (base)

A well-rounded model and our recommended starting point.

from haystack.nodes import TransformersReader

reader = TransformersReader("deepset/roberta-base-squad2")

Pro: Strong all round model that performs capably on a Nvidia V100 GPU

Con: There are other models that are either faster or more accurate

MiniLM

A cleverly distilled model that sacrifices a little accuracy for speed.

from haystack.nodes import TransformersReader
reader = TransformersReader("deepset/minilm-uncased-squad2")

Pro: 40% smaller, 50% faster inference speed and better accuracy than BERT base

Con: Still doesn’t match the best base sized models in accuracy

ALBERT (XXL)

Large, powerful, SotA model.

from haystack.nodes import TransformersReader
reader = TransformersReader("ahotrod/albert_xxlargev1_squad2_512")

Pro: Better accuracy than any other open source model in QA

Con: The computational power needed make it impractical for most use cases

Fine-tuning, Saving, Loading and Converting

In Haystack, it is possible to fine-tune your FARMReader model on any SQuAD format QA dataset. To kick off training, call the train() method. This method will also save your model in the specified save directory.

reader.train(
    data_dir=data_dir,
    train_filename="dev-v2.0.json",
    use_gpu=True,
    n_epochs=1,
    save_dir="my_model"
)

If you want to load the model at a later point, initialize a FARMReader object as follows:

new_reader = FARMReader(model_name_or_path="my_model")

If you would like to convert your model from or into the HuggingFace Transformers format we provide a conversion function. Calling reader.inferencer.model.convert_to_transformers() will return a list of HuggingFace models. This can be particularly useful if you'd like to upload the model to the HuggingFace Model Hub.

transformers_models = reader.inferencer.model.convert_to_transformers()

👍

Tutorial

If you'd like a hands on example, check out our tutorial on fine-tuning here

Confidence Scores

When printing the full results of a Reader, you will see that each prediction is accompanied by a value in the range of 0 to 1 reflecting the model's confidence in that prediction.

In the output of print_answers(), you will find the model's confidence score in dictionary key called score.

from haystack.utils import print_answers

print_answers(prediction, details="all")
{
    'answers': [
        {   'answer': 'Eddard',
            'context': 's Nymeria after a legendary warrior queen. '
                       'She travels with her father, Eddard, to '
                       "King's Landing when he is made Hand of the "
                       'King. Before she leaves,',
            'score': 0.9899835586547852,
            ...
        },
    ]
}

The intuition behind this score is the following: if a model has on average a confidence score of 0.9 that means we can expect the model's predictions to be correct in about 9 out of 10 cases. However, if the model's training data strongly differs from the data it needs to make predictions on, we cannot guarantee that the confidence score and the model's accuracy are well aligned. In order to better align this confidence score with the model's accuracy, finetuning needs to be performed on a specific dataset. To this end, the reader has a method calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin). The parameters of this method are the same as for the eval() method because the calibration of confidence scores is performed on a dataset that comes with gold labels. The calibration calls the eval() method internally and therefore needs a DocumentStore containing labeled questions and evaluation documents.

Have a look at this FARM tutorial to see how to compare calibrated confidence scores with uncalibrated confidence scores within FARM. Note that a finetuned confidence score is specific to the domain that it is finetuned on. There is no guarantee that this performance can transfer to a new domain.

Having a confidence score is particularly useful in cases where you need Haystack to work with a certain accuracy threshold. Many of our users have built systems where predictions below a certain confidence value are routed on to a fallback system.

Deeper Dive: FARM vs Transformers

Apart from the model weights, Haystack Readers contain all the components found in end-to-end open domain QA systems. This includes tokenization, embedding computation, span prediction and candidate aggregation. While the handling of model weights is the same between the FARM and Transformers libraries, their QA pipelines differ in some ways. The major points are:

  • The TransformersReader will sometimes predict the same span twice while duplicates are removed in the FARMReader
  • The FARMReader currently uses the tokenizers from the HuggingFace Transformers library while the TransformersReader uses the tokenizers from the HuggingFace Tokenizers library
  • Start and end logits are normalized per passage and multiplied in the TransformersReader while they are summed and not normalised in the FARMReader

If you’re interested in the finer details of these points, have a look at this GitHub comment.

We see value in maintaining both kinds of Readers since Transformers is a very familiar library to many of Haystack’s users but we at deepset can more easily update and optimise the FARM pipeline for speed and performance.

Haystack also has a close integration with FARM which means that you can further fine-tune your Readers on labelled data using a FARMReader. See our tutorials for an end-to-end example or below for a shortened example.

from haystack.nodes import FARMReader

# Initialise Reader
model = "deepset/roberta-base-squad2"
reader = FARMReader(model)

# Perform finetuning
train_data = "PATH/TO_YOUR/TRAIN_DATA"
train_filename = "train.json"
save_dir = "finetuned_model"
reader.train(train_data, train_filename, save_dir=save_dir)

# Load
finetuned_reader = FARMReader(save_dir)

Deeper Dive: From Language Model to Haystack Reader

Language models form the core of most modern NLP systems and that includes the Readers in Haystack. They build a general understanding of language when performing training tasks such as Masked Language Modeling or Replaced Token Detection on large amounts of text. Well trained language models capture the word distribution in one or more languages but more importantly, convert input text into a set of word vectors that capture elements of syntax and semantics.

In order to convert a language model into a Reader model, it needs first to be trained on a Question Answering dataset. To do so requires the addition of a question answering prediction head on top of the language model. The task can be thought of as a token classification task where every input token is assigned a probability of being either the start or end token of the correct answer. In cases where the answer is not contained within the passage, the prediction head is also expected to return a no_answer prediction.

Since language models are limited in the number of tokens which they can process in a single forward pass, a sliding window mechanism is implemented to handle variable length documents. This functions by slicing the document into overlapping passages of (approximately) max_seq_length that are each offset by doc_stride number of tokens. These can be set when the Reader is initialized.

from haystack.nodes import FARMReader

reader = FARMReader(... max_seq_len=384, doc_stride=128 ...)
from haystack.nodes import TransformersReader

reader = TransformersReader(... max_seq_len=384, doc_stride=128

Predictions are made on each individual passage and the process of aggregation picks the best candidates across all passages. If you’d like to learn more about what is happening behind the scenes, have a look at this article.


Related Links