Reads a set of documents and generates an answer to a question, word by word
Module base
BaseGenerator
class BaseGenerator(BaseComponent)
Abstract class for Generators
BaseGenerator.predict
@abstractmethod
def predict(query: str, documents: List[Document], top_k: Optional[int],
max_tokens: Optional[int]) -> Dict
Abstract method to generate answers.
Arguments:
query
: Querydocuments
: Related documents (for example, coming from a retriever) the answer should be based on.top_k
: Number of returned answers.max_tokens
: THe maximum number of tokens the generated answer can have.
Returns:
Generated answers plus additional infos in a dict
BaseGenerator.predict_batch
def predict_batch(queries: List[str],
documents: Union[List[Document], List[List[Document]]],
top_k: Optional[int] = None,
batch_size: Optional[int] = None,
max_tokens: Optional[int] = None)
Generate the answer to the input queries. The generation will be conditioned on the supplied documents.
These documents can for example be retrieved via the Retriever.
-
If you provide a list containing a single query...
- ... and a single list of Documents, the query will be applied to each Document individually.
- ... and a list of lists of Documents, the query will be applied to each list of Documents and the Answers will be aggregated per Document list.
-
If you provide a list of multiple queries...
- ... and a single list of Documents, each query will be applied to each Document individually.
- ... and a list of lists of Documents, each query will be applied to its corresponding list of Documents and the Answers will be aggregated per query-Document pair.
Arguments:
queries
: List of queries.documents
: Related documents (for example, coming from a retriever) the answer should be based on. Can be a single list of Documents or a list of lists of Documents.top_k
: Number of returned answers per query.batch_size
: Not applicable.max_tokens
: The maximum number of tokens the generated answer can have.
Returns:
Generated answers plus additional infos in a dict like this:
{'queries': 'who got the first nobel prize in physics',
'answers':
[{'query': 'who got the first nobel prize in physics',
'answer': ' albert einstein',
'meta': { 'doc_ids': [...],
'doc_scores': [80.42758 ...],
'doc_probabilities': [40.71379089355469, ...
'content': ['Albert Einstein was a ...]
'titles': ['"Albert Einstein"', ...]
}}]}
Module transformers
RAGenerator
class RAGenerator(BaseGenerator)
Implementation of Facebook's Retrieval-Augmented Generator (https://arxiv.org/abs/2005.11401) based on HuggingFace's transformers (https://huggingface.co/transformers/model_doc/rag.html).
Instead of "finding" the answer within a document, these models generate the answer. In that sense, RAG follows a similar approach as GPT-3 but it comes with two huge advantages for real-world applications: a) it has a manageable model size b) the answer generation is conditioned on retrieved documents, i.e. the model can easily adjust to domain documents even after training has finished (in contrast: GPT-3 relies on the web data seen during training)
Example
query = "who got the first nobel prize in physics?"
# Retrieve related documents from retriever
retrieved_docs = retriever.retrieve(query=query)
# Now generate answer from query and retrieved documents
generator.predict(
query=query,
documents=retrieved_docs,
top_k=1
)
# Answer
{'query': 'who got the first nobel prize in physics',
'answers':
[{'query': 'who got the first nobel prize in physics',
'answer': ' albert einstein',
'meta': { 'doc_ids': [...],
'doc_scores': [80.42758 ...],
'doc_probabilities': [40.71379089355469, ...
'content': ['Albert Einstein was a ...]
'titles': ['"Albert Einstein"', ...]
}}]}
RAGenerator.__init__
def __init__(model_name_or_path: str = "facebook/rag-token-nq",
model_version: Optional[str] = None,
retriever=None,
generator_type: str = "token",
top_k: int = 2,
max_length: int = 200,
min_length: int = 2,
num_beams: int = 2,
embed_title: bool = True,
prefix: Optional[str] = None,
use_gpu: bool = True,
progress_bar: bool = True,
use_auth_token: Optional[Union[str, bool]] = None,
devices: Optional[List[Union[str, torch.device]]] = None)
This component is now deprecated and will be removed in future versions. Use PromptNode
instead of RAGenerator
.
Load a RAG model from Transformers along with passage_embedding_model. See https://huggingface.co/transformers/model_doc/rag.html for more details
Arguments:
model_name_or_path
: Directory of a saved model or the name of a public model e.g. 'facebook/rag-token-nq', 'facebook/rag-sequence-nq'. See https://huggingface.co/models for full list of available models.model_version
: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.retriever
:DensePassageRetriever
used to embedded passages for the docs passed topredict()
. This is optional and is only needed if the docs you pass don't already contain embeddings inDocument.embedding
.generator_type
: Which RAG generator implementation to use ("token" or "sequence")top_k
: Number of independently generated text to returnmax_length
: Maximum length of generated textmin_length
: Minimum length of generated textnum_beams
: Number of beams for beam search. 1 means no beam search.embed_title
: Embedded the title of passage while generating embeddingprefix
: The prefix used by the generator's tokenizer.use_gpu
: Whether to use GPU. Falls back on CPU if no GPU is available.progress_bar
: Whether to show a tqdm progress bar or not.use_auth_token
: The API token used to download private models from Huggingface. If this parameter is set toTrue
, then the token generated when runningtransformers-cli login
(stored in ~/.huggingface) will be used. Additional information can be found here https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretraineddevices
: List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices. A list containing torch device objects and/or strings is supported (For example [torch.device('cuda:0'), "mps", "cuda:1"]). When specifyinguse_gpu=False
the devices parameter is not used and a single cpu device is used for inference.
RAGenerator.predict
def predict(query: str,
documents: List[Document],
top_k: Optional[int] = None,
max_tokens: Optional[int] = None) -> Dict
Generate the answer to the input query. The generation will be conditioned on the supplied documents.
These documents can for example be retrieved via the Retriever.
Arguments:
query
: Querydocuments
: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.top_k
: Number of returned answersmax_tokens
: Maximum number of tokens to generate
Returns:
Generated answers plus additional infos in a dict like this:
{'query': 'who got the first nobel prize in physics',
'answers':
[{'query': 'who got the first nobel prize in physics',
'answer': ' albert einstein',
'meta': { 'doc_ids': [...],
'doc_scores': [80.42758 ...],
'doc_probabilities': [40.71379089355469, ...
'content': ['Albert Einstein was a ...]
'titles': ['"Albert Einstein"', ...]
}}]}
Seq2SeqGenerator
class Seq2SeqGenerator(BaseGenerator)
A generic sequence-to-sequence generator based on Hugging Face's transformers.
We recommend that you use the bart-eli5 and bart_lfqa models from the Hugging Face hub with this generator.
As language models prepare model input in their specific encoding, each model
you specify in the model_name_or_path
parameter in this Seq2SeqGenerator should have an
accompanying model input converter that takes care of prefixes, separator tokens, and so on.
By default, we provide model input converters for a few well-known seq2seq language models (for example ELI5).
It is the responsibility of Seq2SeqGenerator user to ensure an appropriate model input converter
is either already registered or specified on a per-model basis in the Seq2SeqGenerator constructor.
For mode details on custom model input converters refer to _BartEli5Converter (check the code).
Example
query = "Why is Dothraki language important?"
# Retrieve related documents from retriever
retrieved_docs = retriever.retrieve(query=query)
# Now generate answer from query and retrieved documents
generator.predict(
query=query,
documents=retrieved_docs,
top_k=1
)
# Answer
{'query': 'who got the first nobel prize in physics',
'answers':
[{'query': 'who got the first nobel prize in physics',
'answer': ' albert einstein',
'meta': { 'doc_ids': [...],
'doc_scores': [80.42758 ...],
'doc_probabilities': [40.71379089355469, ...
'content': ['Albert Einstein was a ...]
'titles': ['"Albert Einstein"', ...]
}}]}
Seq2SeqGenerator.__init__
def __init__(model_name_or_path: str,
input_converter: Optional[Callable] = None,
top_k: int = 1,
max_length: int = 200,
min_length: int = 2,
num_beams: int = 8,
use_gpu: bool = True,
progress_bar: bool = True,
use_auth_token: Optional[Union[str, bool]] = None,
devices: Optional[List[Union[str, torch.device]]] = None)
This component is now deprecated and will be removed in future versions. Use PromptNode
instead of Seq2SeqGenerator
.
Arguments:
model_name_or_path
: A Hugging Face model name for auto-regressive language model like GPT2, XLNet, XLM, Bart, T5, and so on.input_converter
: An optional callable to prepare model input for the underlying language model specified in themodel_name_or_path
parameter. The required__call__
method signature for the callable is:__call__(tokenizer: PreTrainedTokenizer, query: str, documents: List[Document], top_k: Optional[int] = None) -> BatchEncoding:
.top_k
: Number of independently generated text to returnmax_length
: Maximum length of generated textmin_length
: Minimum length of generated textnum_beams
: Number of beams for beam search. 1 means no beam search.use_gpu
: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.progress_bar
: Whether to show a tqdm progress bar or not.use_auth_token
: The API token used to download private models from Huggingface. If this parameter is set toTrue
, then the token generated when runningtransformers-cli login
(stored in ~/.huggingface) will be used. Additional information can be found here https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretraineddevices
: List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices. A list containing torch device objects and/or strings is supported (For example [torch.device('cuda:0'), "mps", "cuda:1"]). When specifyinguse_gpu=False
the devices parameter is not used and a single cpu device is used for inference.
Seq2SeqGenerator.predict
def predict(query: str,
documents: List[Document],
top_k: Optional[int] = None,
max_tokens: Optional[int] = None) -> Dict
Generate the answer to the input query. The generation will be conditioned on the supplied documents.
These document can be retrieved via the Retriever or supplied directly via predict method.
Arguments:
query
: Querydocuments
: Related documents (e.g. coming from a retriever) that the answer shall be conditioned on.top_k
: Number of returned answersmax_tokens
: Maximum number of tokens in the generated answer
Returns:
Generated answers
Module openai
OpenAIAnswerGenerator
class OpenAIAnswerGenerator(BaseGenerator)
Uses the GPT-3 models from the OpenAI API to generate Answers based on the Documents it receives. The Documents can come from a Retriever or you can supply them manually.
To use this Node, you need an API key from an active OpenAI account. You can sign-up for an account on the OpenAI API website.
OpenAIAnswerGenerator.__init__
def __init__(api_key: str,
azure_base_url: Optional[str] = None,
azure_deployment_name: Optional[str] = None,
model: str = "text-davinci-003",
max_tokens: int = 50,
api_version: str = "2022-12-01",
top_k: int = 5,
temperature: float = 0.2,
presence_penalty: float = 0.1,
frequency_penalty: float = 0.1,
examples_context: Optional[str] = None,
examples: Optional[List[List[str]]] = None,
stop_words: Optional[List[str]] = None,
progress_bar: bool = True,
prompt_template: Optional[PromptTemplate] = None,
context_join_str: str = " ")
Arguments:
api_key
: Your API key from OpenAI. It is required for this node to work.azure_base_url
: The base URL for the Azure OpenAI API. If not supplied, Azure OpenAI API will not be used. This parameter is an OpenAI Azure endpoint, usually in the form `https://.openai.azure.com'azure_deployment_name
: The name of the Azure OpenAI API deployment. If not supplied, Azure OpenAI API will not be used.model
: ID of the engine to use for generating the answer. You can select one of"text-ada-001"
,"text-babbage-001"
,"text-curie-001"
, or"text-davinci-003"
(from worst to best and from cheapest to most expensive). For more information about the models, refer to the OpenAI Documentation.max_tokens
: The maximum number of tokens reserved for the generated Answer. A higher number allows for longer answers without exceeding the max prompt length of the OpenAI model. A lower number allows longer prompts with more documents passed as context, but the generated answer might be cut after max_tokens.api_version
: The version of the Azure OpenAI API to use. The default is2022-12-01
version.top_k
: Number of generated Answers.temperature
: What sampling temperature to use. Higher values mean the model will take more risks and value 0 (argmax sampling) works better for scenarios with a well-defined Answer.presence_penalty
: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have already appeared in the text. This increases the model's likelihood to talk about new topics. For more information about frequency and presence penalties, see parameter details in OpenAI.frequency_penalty
: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. See more information about frequency and presence penalties.examples_context
: A text snippet containing the contextual information used to generate the Answers for the examples you provide. If not supplied, the default from OpenAI API docs is used:"In 2017, U.S. life expectancy was 78.6 years."
examples
: List of (question, answer) pairs that helps steer the model towards the tone and answer format you'd like. We recommend adding 2 to 3 examples. If not supplied, the default from OpenAI API docs is used:[["Q: What is human life expectancy in the United States?", "A: 78 years."]]
stop_words
: Up to four sequences where the API stops generating further tokens. The returned text does not contain the stop sequence. If you don't provide any stop words, the default value from OpenAI API docs is used:["\n", "<|endoftext|>"]
.prompt_template
: A PromptTemplate that tells the model how to generate answers given acontext
andquery
supplied at runtime. Thecontext
is automatically constructed at runtime from a list of provided documents. Useexample_context
and a list ofexamples
to provide the model with examples to steer it towards the tone and answer format you would like. If not supplied, the default prompt template is:
PromptTemplate(
name="question-answering-with-examples",
prompt_text="Please answer the question according to the above context."
"\n===\nContext: {examples_context}\n===\n{examples}\n\n"
"===\nContext: {context}\n===\n{query}",
)
To learn how variables, such as'{context}', are substituted in the prompt_text
, see
PromptTemplate.
context_join_str
: The separation string used to join the input documents to create the context used by the PromptTemplate.
OpenAIAnswerGenerator.predict
def predict(query: str,
documents: List[Document],
top_k: Optional[int] = None,
max_tokens: Optional[int] = None,
timeout: Union[float, Tuple[float, float]] = OPENAI_TIMEOUT)
Use the loaded QA model to generate Answers for a query based on the Documents it receives.
Returns dictionaries containing Answers. Note that OpenAI doesn't return scores for those Answers.
Example:
{
'query': 'Who is the father of Arya Stark?',
'answers':[Answer(
'answer': 'Eddard,',
'score': None,
),...
]
}
Arguments:
query
: The query you want to provide. It's a string.documents
: List of Documents in which to search for the Answer.top_k
: The maximum number of Answers to return.max_tokens
: The maximum number of tokens the generated Answer can have.timeout
: How many seconds to wait for the server to send data before giving up, as a float, or a :ref:(connect timeout, read timeout) <timeouts>
tuple. Defaults to 10 seconds.
Returns:
Dictionary containing query and Answers.