Bird
Raised Fist0
NLPml~20 mins

Extractive summarization in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Extractive Summarization Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main goal of extractive summarization?
Choose the best description of extractive summarization.
AGenerating a summary by selecting important sentences directly from the original text.
BCreating new sentences that paraphrase the original text to form a summary.
CTranslating the original text into another language before summarizing.
DRemoving all stop words and punctuation to shorten the text.
Attempts:
2 left
💡 Hint
Think about whether the summary uses original sentences or new ones.
Predict Output
intermediate
2:00remaining
Output of sentence scoring in extractive summarization
What is the output of this code that scores sentences by word frequency?
NLP
text = "Machine learning is fun. Learning machines can improve. Fun machines learn fast."
words = text.lower().replace('.', '').split()
freq = {}
for w in words:
    freq[w] = freq.get(w, 0) + 1
sentences = text.split('. ')
scores = {}
for s in sentences:
    score = 0
    for word in s.lower().split():
        score += freq.get(word, 0)
    scores[s] = score
print(scores)
ASyntaxError due to missing colon in for loop
B{'Machine learning is fun': 9, 'Learning machines can improve': 8, 'Fun machines learn fast.': 8}
C{'Machine learning is fun': 7, 'Learning machines can improve': 7, 'Fun machines learn fast.': 7}
D{'Machine learning is fun': 6, 'Learning machines can improve': 6, 'Fun machines learn fast.': 5}
Attempts:
2 left
💡 Hint
Count how many times each word appears and sum for each sentence.
Model Choice
advanced
1:30remaining
Best model type for extractive summarization on long documents
Which model is best suited for extractive summarization of very long documents?
AA convolutional neural network designed for image classification.
BA simple logistic regression model using bag-of-words features.
CA transformer model with a long input window like Longformer or BigBird.
DA recurrent neural network without attention mechanisms.
Attempts:
2 left
💡 Hint
Consider models that handle long text efficiently.
Metrics
advanced
1:30remaining
Which metric best evaluates extractive summarization quality?
Choose the metric that best measures how well an extractive summary matches a human summary.
ABLEU score used mainly for machine translation.
BROUGE score, which compares overlapping n-grams between summaries.
CAccuracy of a classification model on sentiment labels.
DMean Squared Error between summary word counts.
Attempts:
2 left
💡 Hint
Think about metrics that compare text overlap.
🔧 Debug
expert
2:00remaining
Why does this extractive summarization code produce empty summary?
Given the code below, why does the summary list remain empty after execution?
NLP
text = "AI is transforming industries. It helps automate tasks."
sentences = text.split('. ')
summary = []
for s in sentences:
    if 'machine' in s.lower():
        summary.append(s)
print(summary)
ABecause none of the sentences contain the word 'machine', so the condition is never true.
BBecause the split method removes the last sentence, leaving summary empty.
CBecause the summary list is overwritten inside the loop instead of appended.
DBecause the print statement is outside the loop and summary is not defined globally.
Attempts:
2 left
💡 Hint
Check the condition inside the loop and the text content.

Practice

(1/5)
1. What is the main goal of extractive summarization in NLP?
easy
A. To translate the text into another language
B. To rewrite the text using simpler words
C. To select important sentences from the original text to create a summary
D. To generate new sentences that explain the text

Solution

  1. Step 1: Understand extractive summarization

    Extractive summarization picks key sentences directly from the original text without changing them.
  2. Step 2: Compare options

    Only To select important sentences from the original text to create a summary describes selecting important sentences from the original text, which matches extractive summarization.
  3. Final Answer:

    To select important sentences from the original text to create a summary -> Option C
  4. Quick Check:

    Extractive summarization = selecting key sentences [OK]
Hint: Extractive means picking from original text directly [OK]
Common Mistakes:
  • Confusing extractive with abstractive summarization
  • Thinking it rewrites or translates text
  • Assuming it generates new sentences
2. Which of the following is a common technique used in extractive summarization?
easy
A. Neural machine translation
B. Text generation with GPT
C. Part-of-speech tagging
D. TF-IDF scoring of sentences

Solution

  1. Step 1: Identify techniques for extractive summarization

    Extractive summarization often uses TF-IDF to score sentences by importance based on word frequency.
  2. Step 2: Eliminate unrelated options

    Neural machine translation and text generation are for other NLP tasks, and POS tagging is not directly used for summarization scoring.
  3. Final Answer:

    TF-IDF scoring of sentences -> Option D
  4. Quick Check:

    TF-IDF = common extractive technique [OK]
Hint: TF-IDF ranks sentence importance in extractive summarization [OK]
Common Mistakes:
  • Confusing summarization with translation or generation
  • Thinking POS tagging directly creates summaries
  • Ignoring TF-IDF's role in scoring
3. Given the following Python code snippet using TF-IDF for extractive summarization, what will be the output?
from sklearn.feature_extraction.text import TfidfVectorizer

texts = ["Cats are great pets.", "Dogs are loyal animals.", "Cats and dogs can live together."]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)
scores = X.sum(axis=1)
print(scores)
medium
A. [[0.0], [0.0], [0.0]]
B. [[2.0], [2.0], [2.4]]
C. [[2.0], [2.0], [3.0]]
D. [[1.0], [1.0], [1.0]]

Solution

  1. Step 1: Understand TF-IDF vectorization and summing

    The code vectorizes three sentences and sums TF-IDF scores per sentence (row-wise sum).
  2. Step 2: Calculate approximate sums

    Each sentence has TF-IDF scores summing roughly to 2.0, 2.0, and 2.4 respectively due to shared and unique words.
  3. Final Answer:

    [[2.0], [2.0], [2.4]] -> Option B
  4. Quick Check:

    Sum TF-IDF per sentence ≈ [[2.0], [2.0], [2.4]] [OK]
Hint: Sum TF-IDF scores per sentence to get importance [OK]
Common Mistakes:
  • Assuming zero scores for all sentences
  • Confusing sum with average
  • Misunderstanding TF-IDF output shape
4. You have this extractive summarization code snippet:
sentences = ["AI is fascinating.", "It helps solve problems.", "AI can learn from data."]
scores = [0.8, 0.9, 0.85]
summary = []
for i in range(len(sentences)):
    if scores[i] > 0.85:
        summary.append(sentences[i])
print(summary)
What is the output and is there any bug?
medium
A. ['It helps solve problems.'] with no bug
B. ['AI is fascinating.', 'It helps solve problems.', 'AI can learn from data.'] with no bug
C. ['It helps solve problems.', 'AI can learn from data.'] but index error bug
D. [] because scores are not compared correctly

Solution

  1. Step 1: Check score filtering condition

    The code adds sentences with scores > 0.85, so sentences with 0.9 and 0.85 are checked; 0.85 is not > 0.85, so only 0.9 and 0.85 fail or pass accordingly.
  2. Step 2: Determine which sentences are included

    Scores: 0.8 (no), 0.9 (yes), 0.85 (no). So only "It helps solve problems." is included. But 0.85 is not > 0.85, so excluded.
  3. Final Answer:

    ['It helps solve problems.'] -> Option A
  4. Quick Check:

    Scores > 0.85 filter sentences correctly [OK]
Hint: Check strict > vs >= in score filtering [OK]
Common Mistakes:
  • Including sentences with score equal to threshold
  • Expecting index errors where none exist
  • Misreading the comparison operator
5. You want to create an extractive summarizer that picks the top 2 sentences from a document based on TF-IDF scores. Given these sentences and their scores:
sentences = ["Machine learning is fun.", "It allows computers to learn.", "Summarization helps understand text.", "TF-IDF ranks sentence importance."]
scores = [0.7, 0.9, 0.6, 0.8]
Which two sentences should your summarizer select?
hard
A. ["It allows computers to learn.", "TF-IDF ranks sentence importance."]
B. ["Machine learning is fun.", "Summarization helps understand text."]
C. ["Summarization helps understand text.", "TF-IDF ranks sentence importance."]
D. ["Machine learning is fun.", "It allows computers to learn."]

Solution

  1. Step 1: Identify top 2 scores

    The scores are 0.7, 0.9, 0.6, 0.8. The top two are 0.9 and 0.8.
  2. Step 2: Match scores to sentences

    0.9 corresponds to "It allows computers to learn.", 0.8 corresponds to "TF-IDF ranks sentence importance.".
  3. Final Answer:

    ["It allows computers to learn.", "TF-IDF ranks sentence importance."] -> Option A
  4. Quick Check:

    Top 2 scores = 0.9 and 0.8 sentences [OK]
Hint: Pick sentences with highest TF-IDF scores [OK]
Common Mistakes:
  • Choosing sentences with lower scores
  • Mixing up sentence-score pairs
  • Selecting more or fewer than top 2