0
0
NLPml~5 mins

Extractive summarization in NLP

Choose your learning style9 modes available
Introduction

Extractive summarization helps pick the most important sentences from a text to make a shorter version. It keeps the original words, so the summary is clear and easy to understand.

You want a quick summary of a long news article.
You need to highlight key points from a meeting transcript.
You want to create a brief overview of a research paper.
You want to help readers get the main ideas from a long blog post.
You want to reduce the length of customer reviews while keeping important opinions.
Syntax
NLP
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def extractive_summary(text, num_sentences=3):
    sentences = text.split('. ')
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(sentences)
    sim_matrix = cosine_similarity(X)
    scores = sim_matrix.sum(axis=1)
    ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]]
    summary = '. '.join(ranked_sentences[:num_sentences])
    return summary

This code splits text into sentences by '. ' which works for simple cases.

TF-IDF helps find important words in sentences, and cosine similarity measures how similar sentences are.

Examples
Get a summary with the top 2 important sentences.
NLP
summary = extractive_summary(text, num_sentences=2)
print(summary)
Get a longer summary with 5 sentences.
NLP
summary = extractive_summary(text, num_sentences=5)
print(summary)
Sample Model

This program summarizes a short paragraph about machine learning by selecting the 2 most important sentences.

NLP
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def extractive_summary(text, num_sentences=3):
    sentences = text.split('. ')
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(sentences)
    sim_matrix = cosine_similarity(X)
    scores = sim_matrix.sum(axis=1)
    ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]]
    summary = '. '.join(ranked_sentences[:num_sentences])
    return summary

sample_text = (
    "Machine learning is a method of data analysis that automates analytical model building. "
    "It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. "
    "Because of new computing technologies, machine learning today is not like machine learning of the past. "
    "It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. "
    "Researchers interested in artificial intelligence wanted to see if computers could learn from data."
)

summary = extractive_summary(sample_text, num_sentences=2)
print(summary)
OutputSuccess
Important Notes

Extractive summarization keeps original sentences, so the summary is easy to read.

It may not always produce perfectly smooth summaries because it just picks sentences.

For better results, more advanced methods can be used, but this simple method works well for beginners.

Summary

Extractive summarization picks key sentences from text to make a short summary.

It uses techniques like TF-IDF and similarity to find important sentences.

This method keeps the original wording, making summaries easy to understand.