The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning. You can run it stand-alone or in a pipeline.
You can use any summarization model from Hugging Face Transformers by providing the model name. By default, the Google Pegasus model is loaded.
To initialize and run a stand-alone Summarizer:
from haystack.nodes import TransformersSummarizer from haystack import Document docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\ The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\ the shutoffs which were expected to last through at least midday tomorrow.")] summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum") summary = summarizer.predict(documents=docs)
summary is a list of
Document instances. The original text is stored on each document in
document.content and the summary is in
summary.content # "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions..." summary.meta["summary"] # "California's largest electricity provider has turned off power to hundreds of thousands of customers."
To use a Summarizer in an indexing pipeline, run:
from haystack import Pipeline p = Pipeline() p.add_node(component=summarizer, name="Summarizer", inputs=["File"]) p.add_node(component=document_store, name="DocumentStore", inputs=["Summarizer"]) p.run(documents=[d1, d2, d3...])
The summary will be found in the Document's
meta attribute in a field called
documents = document_store.get_all_documents() documents.meta["summary"] # "California's largest electricity provider has turned off power to hundreds of thousands of customers."
If you have multiple Documents that you want a single summary for, you can merge Documents together using the DocumentMerger node.
Updated 6 days ago