Complete the code to load a pre-trained summarization model using Hugging Face Transformers.
from transformers import pipeline summarizer = pipeline('summarization', model='[1]')
The facebook/bart-large-cnn model is a popular pre-trained model for summarization tasks.
Complete the code to split a long document into chunks of 500 tokens for summarization.
def chunk_text(text, chunk_size=500): tokens = text.split() return [tokens[i:i+[1]] for i in range(0, len(tokens), [1])]
We split the text into chunks of 500 tokens to manage long documents effectively.
Fix the error in the code that summarizes each chunk and combines the results.
def summarize_long_text(text, summarizer): chunks = chunk_text(text, 500) summaries = [] for chunk in chunks: summary = summarizer(' '.join(chunk), max_length=150, min_length=40, do_sample=False)[[1]]['summary_text'] summaries.append(summary) return ' '.join(summaries)
The summarizer returns a list of dictionaries; the first element at index 0 contains the summary text.
Fill both blanks to create a function that uses a sliding window to summarize overlapping chunks.
def sliding_window_chunks(text, window_size=500, step_size=[1]): tokens = text.split() return [' '.join(tokens[i:i+window_size]) for i in range(0, len(tokens), [2]) if i + window_size <= len(tokens)]
Using a step size of 250 creates overlapping chunks with a sliding window approach.
Fill all three blanks to build a pipeline that summarizes a long document by chunking, summarizing, and then summarizing the combined summary.
def hierarchical_summarization(text, summarizer): chunks = chunk_text(text, [1]) summaries = [summarizer(' '.join(chunk), max_length=150, min_length=40, do_sample=False)[[2]]['summary_text'] for chunk in chunks] combined_summary = ' '.join(summaries) final_summary = summarizer(combined_summary, max_length=200, min_length=50, do_sample=False)[[3]]['summary_text'] return final_summary
Chunks of 500 tokens are summarized first. The summarizer output is accessed at index 0 for both steps.