DocumentationAPI ReferenceTutorialsGitHub Code ExamplesDiscord Community

Takes a Document as input and generates questions which it believes the Document can answer.

Module question_generator


class QuestionGenerator(BaseComponent)

The QuestionGenerator takes only a document as input and outputs questions that it thinks this document can answer. In the current implementation, it splits input texts into chunks of 50 words with a 10 word overlap. This is because the default model valhalla/t5-base-e2e-qg seems to generate only about 3 questions per passage, regardless of length.

Our approach prioritizes the creation of more questions over processing efficiency (T5 can digest much more than 50 words at once). The returned questions generally come in an order dictated by the order of their answers, this means early questions in the list generally come from earlier in the document.


def __init__(model_name_or_path: str = "valhalla/t5-base-e2e-qg",
             model_version: Optional[str] = None,
             num_beams: int = 4,
             max_length: int = 256,
             no_repeat_ngram_size: int = 3,
             length_penalty: float = 1.5,
             early_stopping: bool = True,
             split_length: int = 50,
             split_overlap: int = 10,
             use_gpu: bool = True,
             prompt: str = "generate questions:",
             num_queries_per_doc: int = 1,
             sep_token: str = "<sep>",
             batch_size: int = 16,
             progress_bar: bool = True,
             use_auth_token: Optional[Union[str, bool]] = None,
             devices: Optional[List[Union[str, "torch.device"]]] = None)

Uses the valhalla/t5-base-e2e-qg model by default. This class supports any question generation model that is implemented as a Seq2SeqLM in Hugging Face Transformers.

Note that this style of question generation (where the only input is a document) is sometimes referred to as end-to-end question generation. Answer-supervised question generation is not currently supported.


  • model_name_or_path: Directory of a saved model or the name of a public model, for example "valhalla/t5-base-e2e-qg". See Hugging Face models for a full list of available models.
  • model_version: The version of the model to use from the Hugging Face model hub. Can be a tag name, a branch name, or a commit hash.
  • num_beams: The number of beams for beam search. 1 means no beam search.
  • max_length: The maximum number of characters the generated text can have.
  • no_repeat_ngram_size: If set to a number larger than 0, all ngrams whose size equals this number can only occur once. For example, if you set it to 3, all 3-grams can appear once.
  • length_penalty: Encourages the model to generate longer or shorter texts, depending on the value you specify. Values greater than 0.0 promote longer sequences, while values less than 0.0 promote shorter sequences. Used with text generation based on beams.
  • early_stopping: Defines the stopping condition for beam search. True means the model stops generating text after reaching the num_beams. False means the model stops generating text only if it's unlikely to find better candidates.
  • split_length: Determines the length of the split (a chunk of a document). Used by num_queries_per_doc.
  • split_overlap: Configures the amount of overlap between two adjacent documents after a split. Setting it to a positive number enables sliding window approach.
  • use_gpu: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
  • prompt: Contains the prompt with instructions for the model.
  • batch_size: Number of documents to process at a time.
  • num_queries_per_doc: Number of questions to generate per document. However, this is actually a number of questions to generate per split in the document where the split_length determines the length of the split and the split_overlap determines the overlap between splits. Therefore, this parameter is multiplied by the resulting number of splits to get the total number of questions generated per document. This value is capped at 3.
  • sep_token: A special token that separates two sentences in the same output.
  • progress_bar: Whether to show a tqdm progress bar or not.
  • use_auth_token: The API token used to download private models from Hugging Face. If set to True, the token generated when running transformers-cli login (stored in ~/.huggingface) is used. For more information, see Hugging Face.
  • devices: List of torch devices (for example cuda, cpu, mps) to limit inference to specific devices. A list containing torch device objects or strings is supported (for example [torch.device('cuda:0'), "mps", "cuda:1"]). If you specify use_gpu=False, the devices parameter is not used and a single CPU device is used for inference.


def generate_batch(
    texts: Union[List[str], List[List[str]]],
    batch_size: Optional[int] = None
) -> Union[List[List[str]], List[List[List[str]]]]

Generates questions for a list of strings or a list of lists of strings.


  • texts: List of str or list of list of str.
  • batch_size: Number of texts to process at a time.