DocumentationAPI Reference📓 Tutorials🧑‍🍳 Cookbook🤝 Integrations💜 Discord🎨 Studio (Waitlist)
Documentation

Summarizer

The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning. You can run it stand-alone or in a pipeline.

You can use any summarization model from Hugging Face Transformers by providing the model name. By default, the Google Pegasus model is loaded.

Position in a PipelineAfter preprocessing in an indexing Pipeline or after the Retriever in a querying Pipeline
InputDocuments
OutputDocuments
ClassesTransformersSummarizer

Usage

To initialize and run a stand-alone Summarizer:

from haystack.nodes import TransformersSummarizer
from haystack import Document

docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\
                 The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\
                 the shutoffs which were expected to last through at least midday tomorrow.")]

summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
summary = summarizer.predict(documents=docs)

The summary is a list of Document instances. The original text is stored on each document in document.content and the summary is in document.meta["summary"].

summary[0].content

# "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions..."

summary[0].meta["summary"]

# "California's largest electricity provider has turned off power to hundreds of thousands of customers."

To use a Summarizer in an indexing pipeline, run:

from haystack import Pipeline

p = Pipeline()
p.add_node(component=summarizer, name="Summarizer", inputs=["File"])
p.add_node(component=document_store, name="DocumentStore", inputs=["Summarizer"])
p.run(documents=[d1, d2, d3...])

The summary will be found in the Document's meta attribute in a field called summary

documents = document_store.get_all_documents()
documents[0].meta["summary"]

# "California's largest electricity provider has turned off power to hundreds of thousands of customers."

If you have multiple Documents that you want a single summary for, you can merge Documents together using the DocumentMerger node.


Related Links