What's nucleus (top_p) sampling?

Nucleus sampling is expressed in the top_p parameter used in generative question answering. It controls the level of randomness and diversity in the generated text.

When top_p is set to a high value, the model is more likely to generate diverse and creative outputs. When set to a low value, the model is more likely to generate predictable and less risky outputs.

Nucleus sampling is often used in combination with other parameters, such as temperature and top_k to achieve the balance between creativity and coherence in the generated text.

See also Model Parameters.

While nucleus, or top p, sampling is usually mentioned in the context of the next token selection in generative NLP models, we can also use it to filter documents based on the cumulative probability of the similarity scores between the query and the documents.

In this context, top p sampling selects a subset of diverse query's most relevant documents while also removing unrelated documents. The technique involves calculating the cumulative probability of the scores of the query's most similar documents, and then selecting the top p percent of the most similar documents with the highest cumulative probability.


Position in a Pipeline	After the Retriever
Input	Documents
Output	Documents
Classes	TopPSampler

By default, TopPSampler uses the ms-marco-MiniLM-L-6-v2 model, but you can replace it with any other cross encoder model. For a full list of models, see Hugging Face.

Usage

TopPSampler is used in combination with other nodes, such as WebRetriever to limit the number of results they return. Here's an example of TopPSampler in a pipeline:

retriever = WebRetriever(api_key="<your_api_key_here>", mode="preprocessed_documents")
sampler = TopPSampler(top_p=0.95)

p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=sampler, name="Sampler", inputs=["Retriever"])
print(p.run(query="What's the secret of the Universe?"))