Reader
The Reader takes a question and a set of Documents as input and returns an Answer by selecting a text span within the Documents. Readers use models to perform QA. Learn about the available Reader classes and the recommended models.
The Reader is also known as an Open-Domain QA system in Machine Learning speak.
Pros
- Built on the latest transformer-based language models.
- Strong in their grasp of semantics.
- Sensitive to syntactic structure.
- State-of-the-art in QA tasks like SQuAD and Natural Questions.
Haystack Readers contain all the components of end-to-end, open-domain QA systems, including:
- Loading of model weights.
- Tokenization.
- Embedding computation.
- Span prediction.
- Candidate aggregation.
Cons
- Requires a GPU to run quickly.
Usage
To initialize a Reader, run:
from haystack.nodes import FARMReader
model_name_or_path = "deepset/roberta-base-squad2"
reader = FARMReader(model_name_or_path, use_gpu=True)
from haystack.nodes import TransformersReader
model = "deepset/roberta-base-squad2"
reader = TransformersReader(model, use_gpu=1)
To run a Reader on its own, use the Reader.predict()
method:
result = reader.predict(
query="Which country is Canberra located in?",
documents=documents,
top_k=10
)
This returns a dictionary of the following format:
{
'query': 'Which country is Canberra located in?',
'answers':[
{'answer': 'Australia',
'context': "Canberra, federal capital of the Commonwealth of Australia. It occupies part of the Australian Capital Territory (ACT),",
'offset_answer_start': 147,
'offset_answer_end': 154,
'score': 0.9787139466668613,
'document_id': '1337'
},...
],
}
To set up Haystack as a service, use the Reader in a pipeline:
from haystack.pipelines import ExtractiveQAPipeline
pipe = ExtractiveQAPipeline(reader, retriever)
prediction = pipe.run(
query='Which country is Canberra located in?',
params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 10}}
)
TableReader
With the TableReader
, you can get answers to your questions even if the answer is in a table. It uses the TAPAS model created by Google.
These models are able to return a single cell as an answer or pick a set of cells and then perform an aggregation operation to form a final answer. To find out more, have a look at our guide on Table Question Answering.
Models
Readers use models. Different models have different strengths and weaknesses. Larger models are generally more accurate but sacrifice some speed. Models trained on different data may be more suited to certain domains. For more information about models, see Language Models.
You can find many open source Reader models on the HuggingFace Model Hub. Haystack automatically downloads and loads the model if you provide the Model Hub name when initializing the Reader.
Compatible models
The Reader supports extractive question answering models with a BERT-based architecture, such as:
- BERT
- RoBERTa
- ALBERT
- MiniLM
- XLM
- DistilBERT
- XLM-RoBERTa
- DeBERTa
If you're using a sequence to sequence model, such as BART, use the Answer Generator instead.
If you're unsure which model to use, here are our recommendations you can start with:
FARM
RoBERTa (base)
A well-rounded model and our recommended starting point.
from haystack.nodes import FARMReader
reader = FARMReader("deepset/roberta-base-squad2")
Pro: Strong, all-round model that performs well on an Nvidia V100 GPU.
Con: There are other models that are either faster or more accurate.
MiniLM
A distilled model that sacrifices a little accuracy for speed.
from haystack.nodes import FARMReader
reader = FARMReader("deepset/minilm-uncased-squad2")
Pro: 40% smaller, 50% faster inference speed, and better accuracy than BERT base.
Con: Still doesn’t match the best base-sized models in accuracy.
ALBERT (XXL)
Large, powerful, SotA model.
from haystack.nodes import FARMReader
reader = FARMReader("ahotrod/albert_xxlargev1_squad2_512")
Pro: Better accuracy than any other open source model in QA.
Con: The computational power needed makes it impractical for most use cases.
Transformers
RoBERTa (base)
A well-rounded model and our recommended starting point.
from haystack.nodes import TransformersReader
reader = TransformersReader("deepset/roberta-base-squad2")
Pro: Strong all-round model that performs well on an Nvidia V100 GPU.
Con: There are other models that are either faster or more accurate.
MiniLM
A distilled model that sacrifices a little accuracy for speed.
from haystack.nodes import TransformersReader
reader = TransformersReader("deepset/minilm-uncased-squad2")
Pro: 40% smaller, 50% faster inference speed, and better accuracy than BERT base.
Con: Still doesn’t match the best base-sized models in accuracy.
ALBERT (XXL)
Large, powerful, SotA model.
from haystack.nodes import TransformersReader
reader = TransformersReader("ahotrod/albert_xxlargev1_squad2_512")
Pro: Better accuracy than any other open source model in QA.
Con: The computational power needed makes it impractical for most use cases
Fine-tuning, Saving, Loading, and Converting
In Haystack, it is possible to fine-tune your FARMReader model on any SQuAD format QA dataset. To kick off training, call the train()
method. This method also saves your model in the specified directory.
reader.train(
data_dir=data_dir,
train_filename="dev-v2.0.json",
use_gpu=True,
n_epochs=1,
save_dir="my_model"
)
To load the model at a later point, initialize a FARMReader
object as follows:
new_reader = FARMReader(model_name_or_path="my_model")
To convert your model from or into the Hugging Face Transformers format, use a conversion function. Calling reader.inferencer.model.convert_to_transformers()
returns a list of Hugging Face models. This can be particularly useful if you want to upload the model to the Hugging Face Model Hub.
transformers_models = reader.inferencer.model.convert_to_transformers()
Instead of defining a fixed number of training epochs, you can train the model using a method called early stopping. This method performs cycles of training and evaluation until the model is no longer improving. To use this approach, run:
from haystack.nodes import FARMReader
from haystack.utils.early_stopping import EarlyStopping
early_stopping = EarlyStopping(
metric='top_n_accuracy',
save_dir='early_stop_model',
mode='max',
patience=10,
min_delta=0.001,
min_evals=0,
)
reader = FARMReader(model_name_or_path='deepset/roberta-base-squad2', use_gpu=True)
reader.train(
data_dir='./data',
train_filename='train.json',
dev_filename='dev.json',
use_gpu=True,
early_stopping=early_stopping
)
For more details about the EarlyStopping class, see the EarlyStopping API documentation.
Tutorial
For a hands-on example, check out our tutorial on fine-tuning.
Confidence Scores
When printing the full results of a Reader, each prediction has a value in the range of 0 to 1. This value reflects the model's confidence in that prediction.
You can find the model's confidence score in the output of print_answers()
. It's in a dictionary key called score
.
from haystack.utils import print_answers
print_answers(prediction, details="all")
{
'answers': [
{ 'answer': 'Eddard',
'context': 's Nymeria after a legendary warrior queen. '
'She travels with her father, Eddard, to '
"King's Landing when he is made Hand of the "
'King. Before she leaves,',
'score': 0.9899835586547852,
...
},
]
}
Confidence score tells you how sure a model is about the answer. If the confidence score is 0.9, the model's predictions should be correct in about nine out of 10 cases.
It may happen that the mode's training data is much different from the data it makes predictions on. In such case, the confidence score and the model's accuracy may not be well-aligned. To improve this alignment, fine-tune your model on a specific dataset.
The Reader has a calibrate_confidence_scores()
method that you can use for this purpose. The parameters of this method are the same as for the eval()
method. That's because we calibrate confidence scores on a dataset with gold labels. The calibration calls the eval()
method internally. That's why it needs a DocumentStore containing labeled questions and evaluation documents.
This FARM tutorial can teach you how to compare calibrated confidence scores with uncalibrated ones in FARM. Note that a fine-tuned confidence score is specific to the domain that it is fine-tuned on. There is no guarantee that this performance can transfer to a new domain.
A confidence score is useful if you need Haystack to work with a certain accuracy threshold. You can then route predictions not matching the defined confidence score on to a fallback system.
Deeper Dive: FARM vs Transformers
Apart from the model weights, Haystack Readers contain all the components found in end-to-end open-domain QA systems. This includes tokenization, embedding computation, span prediction, and candidate aggregation. FARM and Transformers libraries handle weights in the same way but their QA pipelines differ. The major points are:
-
The TransformersReader sometimes predicts the same span twice. The FARMReader removes duplicates.
-
The FARMReader currently uses the tokenizers from the Hugging Face Transformers library. The TransformersReader uses the tokenizers from the Hugging Face Tokenizers.
-
TransformersReader normalizes start and end logits per passage, and multiplies them. FARMReader sums them and doesn't normalize.
If you’re interested in the finer details of these points, have a look at this Github comment.
We see value in maintaining both kinds of Readers. Transformers is a library familiar to many Haystack’s users. For us at deepset, it's easier to update and optimize the FARM pipeline for speed and performance.
Haystack also closely integrates with FARM. This means you can further fine-tune your Readers on labeled data using a FARMReader. See our tutorials for an end-to-end example or below for a shortened example.
from haystack.nodes import FARMReader
# Initialise Reader
model = "deepset/roberta-base-squad2"
reader = FARMReader(model)
# Perform fine-tuning
train_data = "PATH/TO_YOUR/TRAIN_DATA"
train_filename = "train.json"
save_dir = "finetuned_model"
reader.train(train_data, train_filename, save_dir=save_dir)
# Load
finetuned_reader = FARMReader(save_dir)
Deeper Dive: From Language Model to Haystack Reader
Language models form the core of most modern NLP systems. That includes the Readers in Haystack. Models build a general understanding of language during training. Training includes tasks such as masked language modeling or replaced token detection on large amounts of text. Well-trained language models capture the word distribution in one or more languages. They also convert input text into word vectors that capture syntax and semantics.
To convert a language model into a Reader model, you must first train it on a question answering dataset. To do so, add a question answering prediction head on top of the language model. You can think of it as a token classification task. In such a task, every input token is assigned a probability of being the start or end token of the correct answer. If a passage doesn't contain the answer, the prediction head should return a no_answer
prediction.
The number of tokens language models can process in a single forward pass is limited. That's why we implemented a sliding window mechanism. It can handle documents of variable length. This mechanism slices the document into overlapping passages of approximately max_seq_len
. Each passage is offset by a doc_stride
number of tokens. You can set these parameters when initializing the Reader:
from haystack.nodes import FARMReader
reader = FARMReader(... max_seq_len=384, doc_stride=128 ...)
from haystack.nodes import TransformersReader
reader = TransformersReader(... max_seq_len=384, doc_stride=128
Models make predictions on each individual passage. Then, the process of aggregation picks the best candidates across all passages. To learn about what is happening behind the scenes, have a look at Modern Question Answering Systems Explained.
Updated over 1 year ago