Extractive summarization helps pick the most important sentences from a text to make a shorter version. It keeps the original words, so the summary is clear and easy to understand.
Extractive summarization in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import numpy as np def extractive_summary(text, num_sentences=3): sentences = text.split('. ') vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(sentences) sim_matrix = cosine_similarity(X) scores = sim_matrix.sum(axis=1) ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]] summary = '. '.join(ranked_sentences[:num_sentences]) return summary
This code splits text into sentences by '. ' which works for simple cases.
TF-IDF helps find important words in sentences, and cosine similarity measures how similar sentences are.
summary = extractive_summary(text, num_sentences=2) print(summary)
summary = extractive_summary(text, num_sentences=5) print(summary)
This program summarizes a short paragraph about machine learning by selecting the 2 most important sentences.
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import numpy as np def extractive_summary(text, num_sentences=3): sentences = text.split('. ') vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(sentences) sim_matrix = cosine_similarity(X) scores = sim_matrix.sum(axis=1) ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]] summary = '. '.join(ranked_sentences[:num_sentences]) return summary sample_text = ( "Machine learning is a method of data analysis that automates analytical model building. " "It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. " "Because of new computing technologies, machine learning today is not like machine learning of the past. " "It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. " "Researchers interested in artificial intelligence wanted to see if computers could learn from data." ) summary = extractive_summary(sample_text, num_sentences=2) print(summary)
Extractive summarization keeps original sentences, so the summary is easy to read.
It may not always produce perfectly smooth summaries because it just picks sentences.
For better results, more advanced methods can be used, but this simple method works well for beginners.
Extractive summarization picks key sentences from text to make a short summary.
It uses techniques like TF-IDF and similarity to find important sentences.
This method keeps the original wording, making summaries easy to understand.
Practice
Solution
Step 1: Understand extractive summarization
Extractive summarization picks key sentences directly from the original text without changing them.Step 2: Compare options
Only To select important sentences from the original text to create a summary describes selecting important sentences from the original text, which matches extractive summarization.Final Answer:
To select important sentences from the original text to create a summary -> Option CQuick Check:
Extractive summarization = selecting key sentences [OK]
- Confusing extractive with abstractive summarization
- Thinking it rewrites or translates text
- Assuming it generates new sentences
Solution
Step 1: Identify techniques for extractive summarization
Extractive summarization often uses TF-IDF to score sentences by importance based on word frequency.Step 2: Eliminate unrelated options
Neural machine translation and text generation are for other NLP tasks, and POS tagging is not directly used for summarization scoring.Final Answer:
TF-IDF scoring of sentences -> Option DQuick Check:
TF-IDF = common extractive technique [OK]
- Confusing summarization with translation or generation
- Thinking POS tagging directly creates summaries
- Ignoring TF-IDF's role in scoring
from sklearn.feature_extraction.text import TfidfVectorizer texts = ["Cats are great pets.", "Dogs are loyal animals.", "Cats and dogs can live together."] vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(texts) scores = X.sum(axis=1) print(scores)
Solution
Step 1: Understand TF-IDF vectorization and summing
The code vectorizes three sentences and sums TF-IDF scores per sentence (row-wise sum).Step 2: Calculate approximate sums
Each sentence has TF-IDF scores summing roughly to 2.0, 2.0, and 2.4 respectively due to shared and unique words.Final Answer:
[[2.0], [2.0], [2.4]] -> Option BQuick Check:
Sum TF-IDF per sentence ≈ [[2.0], [2.0], [2.4]] [OK]
- Assuming zero scores for all sentences
- Confusing sum with average
- Misunderstanding TF-IDF output shape
sentences = ["AI is fascinating.", "It helps solve problems.", "AI can learn from data."]
scores = [0.8, 0.9, 0.85]
summary = []
for i in range(len(sentences)):
if scores[i] > 0.85:
summary.append(sentences[i])
print(summary)
What is the output and is there any bug?Solution
Step 1: Check score filtering condition
The code adds sentences with scores > 0.85, so sentences with 0.9 and 0.85 are checked; 0.85 is not > 0.85, so only 0.9 and 0.85 fail or pass accordingly.Step 2: Determine which sentences are included
Scores: 0.8 (no), 0.9 (yes), 0.85 (no). So only "It helps solve problems." is included. But 0.85 is not > 0.85, so excluded.Final Answer:
['It helps solve problems.'] -> Option AQuick Check:
Scores > 0.85 filter sentences correctly [OK]
- Including sentences with score equal to threshold
- Expecting index errors where none exist
- Misreading the comparison operator
sentences = ["Machine learning is fun.", "It allows computers to learn.", "Summarization helps understand text.", "TF-IDF ranks sentence importance."] scores = [0.7, 0.9, 0.6, 0.8]Which two sentences should your summarizer select?
Solution
Step 1: Identify top 2 scores
The scores are 0.7, 0.9, 0.6, 0.8. The top two are 0.9 and 0.8.Step 2: Match scores to sentences
0.9 corresponds to "It allows computers to learn.", 0.8 corresponds to "TF-IDF ranks sentence importance.".Final Answer:
["It allows computers to learn.", "TF-IDF ranks sentence importance."] -> Option AQuick Check:
Top 2 scores = 0.9 and 0.8 sentences [OK]
- Choosing sentences with lower scores
- Mixing up sentence-score pairs
- Selecting more or fewer than top 2
