What is Extractive summarization in NLP?

NLPml~5 mins

Extractive summarization in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Extractive summarization helps pick the most important sentences from a text to make a shorter version. It keeps the original words, so the summary is clear and easy to understand.

You want a quick summary of a long news article.

You need to highlight key points from a meeting transcript.

You want to create a brief overview of a research paper.

You want to help readers get the main ideas from a long blog post.

You want to reduce the length of customer reviews while keeping important opinions.

Syntax

NLP

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def extractive_summary(text, num_sentences=3):
    sentences = text.split('. ')
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(sentences)
    sim_matrix = cosine_similarity(X)
    scores = sim_matrix.sum(axis=1)
    ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]]
    summary = '. '.join(ranked_sentences[:num_sentences])
    return summary

This code splits text into sentences by '. ' which works for simple cases.

TF-IDF helps find important words in sentences, and cosine similarity measures how similar sentences are.

Examples

Get a summary with the top 2 important sentences.

NLP

summary = extractive_summary(text, num_sentences=2)
print(summary)

Get a longer summary with 5 sentences.

NLP

summary = extractive_summary(text, num_sentences=5)
print(summary)

Sample Model

This program summarizes a short paragraph about machine learning by selecting the 2 most important sentences.

NLP

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def extractive_summary(text, num_sentences=3):
    sentences = text.split('. ')
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(sentences)
    sim_matrix = cosine_similarity(X)
    scores = sim_matrix.sum(axis=1)
    ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]]
    summary = '. '.join(ranked_sentences[:num_sentences])
    return summary

sample_text = (
    "Machine learning is a method of data analysis that automates analytical model building. "
    "It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. "
    "Because of new computing technologies, machine learning today is not like machine learning of the past. "
    "It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. "
    "Researchers interested in artificial intelligence wanted to see if computers could learn from data."
)

summary = extractive_summary(sample_text, num_sentences=2)
print(summary)

OutputSuccess

Important Notes

Extractive summarization keeps original sentences, so the summary is easy to read.

It may not always produce perfectly smooth summaries because it just picks sentences.

For better results, more advanced methods can be used, but this simple method works well for beginners.

Summary

Extractive summarization picks key sentences from text to make a short summary.

It uses techniques like TF-IDF and similarity to find important sentences.

This method keeps the original wording, making summaries easy to understand.

Practice

(1/5)

1. What is the main goal of extractive summarization in NLP?

easy

A. To translate the text into another language

B. To rewrite the text using simpler words

C. To select important sentences from the original text to create a summary

D. To generate new sentences that explain the text

Extractive summarization in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand extractive summarization

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Identify techniques for extractive summarization

Step 2: Eliminate unrelated options

Final Answer:

Quick Check:

Solution

Step 1: Understand TF-IDF vectorization and summing

Step 2: Calculate approximate sums

Final Answer:

Quick Check:

Solution

Step 1: Check score filtering condition

Step 2: Determine which sentences are included

Final Answer:

Quick Check:

Solution

Step 1: Identify top 2 scores

Step 2: Match scores to sentences

Final Answer:

Quick Check: