0
0
NLPml~10 mins

Long document summarization strategies in NLP - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load a pre-trained summarization model using Hugging Face Transformers.

NLP
from transformers import pipeline
summarizer = pipeline('summarization', model='[1]')
Drag options to blanks, or click blank then click option'
Agpt2
Broberta-base
Cfacebook/bart-large-cnn
Dbert-base-uncased
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing a model not designed for summarization like 'bert-base-uncased'.
Using a language model like 'gpt2' which is not fine-tuned for summarization.
2fill in blank
medium

Complete the code to split a long document into chunks of 500 tokens for summarization.

NLP
def chunk_text(text, chunk_size=500):
    tokens = text.split()
    return [tokens[i:i+[1]] for i in range(0, len(tokens), [1])]
Drag options to blanks, or click blank then click option'
A500
B250
C1000
D50
Attempts:
3 left
💡 Hint
Common Mistakes
Using different numbers for slicing and step size causing uneven chunks.
Choosing chunk sizes too small or too large for model input limits.
3fill in blank
hard

Fix the error in the code that summarizes each chunk and combines the results.

NLP
def summarize_long_text(text, summarizer):
    chunks = chunk_text(text, 500)
    summaries = []
    for chunk in chunks:
        summary = summarizer(' '.join(chunk), max_length=150, min_length=40, do_sample=False)[[1]]['summary_text']
        summaries.append(summary)
    return ' '.join(summaries)
Drag options to blanks, or click blank then click option'
A0
Btext
Csummary
D1
Attempts:
3 left
💡 Hint
Common Mistakes
Using index 1 which causes an index error.
Trying to access 'summary' or 'text' keys which do not exist.
4fill in blank
hard

Fill both blanks to create a function that uses a sliding window to summarize overlapping chunks.

NLP
def sliding_window_chunks(text, window_size=500, step_size=[1]):
    tokens = text.split()
    return [' '.join(tokens[i:i+window_size]) for i in range(0, len(tokens), [2]) if i + window_size <= len(tokens)]
Drag options to blanks, or click blank then click option'
A250
B500
C100
D50
Attempts:
3 left
💡 Hint
Common Mistakes
Using step size equal to window size, which causes no overlap.
Using step size larger than window size, which is invalid.
5fill in blank
hard

Fill all three blanks to build a pipeline that summarizes a long document by chunking, summarizing, and then summarizing the combined summary.

NLP
def hierarchical_summarization(text, summarizer):
    chunks = chunk_text(text, [1])
    summaries = [summarizer(' '.join(chunk), max_length=150, min_length=40, do_sample=False)[[2]]['summary_text'] for chunk in chunks]
    combined_summary = ' '.join(summaries)
    final_summary = summarizer(combined_summary, max_length=200, min_length=50, do_sample=False)[[3]]['summary_text']
    return final_summary
Drag options to blanks, or click blank then click option'
A500
B0
C1
D250
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong indices like 1 causing errors.
Using inconsistent chunk sizes.