Summarizer
The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning. You can run it stand-alone or in a pipeline.
You can use any summarization model from Hugging Face Transformers by providing the model name. By default, the Google Pegasus model is loaded.
Usage
To initialize and run a stand-alone Summarizer:
from haystack.nodes import TransformersSummarizer
from haystack import Document
docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\
The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\
the shutoffs which were expected to last through at least midday tomorrow.")]
summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
summary = summarizer.predict(documents=docs)
The summary
is a list of Document
instances. The original text is stored on each document in document.content
and the summary is in document.meta["summary"]
.
summary[0].content
# "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions..."
summary[0].meta["summary"]
# "California's largest electricity provider has turned off power to hundreds of thousands of customers."
To use a Summarizer in an indexing pipeline, run:
from haystack import Pipeline
p = Pipeline()
p.add_node(component=summarizer, name="Summarizer", inputs=["File"])
p.add_node(component=document_store, name="DocumentStore", inputs=["Summarizer"])
p.run(documents=[d1, d2, d3...])
The summary will be found in the Document's meta
attribute in a field called summary
documents = document_store.get_all_documents()
documents[0].meta["summary"]
# "California's largest electricity provider has turned off power to hundreds of thousands of customers."
If you have multiple Documents that you want a single summary for, you can merge Documents together using the DocumentMerger node.
Updated over 1 year ago