Extractive summarization helps pick the most important sentences from a text to make a shorter version. It keeps the original words, so the summary is clear and easy to understand.
Extractive summarization in NLP
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import numpy as np def extractive_summary(text, num_sentences=3): sentences = text.split('. ') vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(sentences) sim_matrix = cosine_similarity(X) scores = sim_matrix.sum(axis=1) ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]] summary = '. '.join(ranked_sentences[:num_sentences]) return summary
This code splits text into sentences by '. ' which works for simple cases.
TF-IDF helps find important words in sentences, and cosine similarity measures how similar sentences are.
summary = extractive_summary(text, num_sentences=2) print(summary)
summary = extractive_summary(text, num_sentences=5) print(summary)
This program summarizes a short paragraph about machine learning by selecting the 2 most important sentences.
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import numpy as np def extractive_summary(text, num_sentences=3): sentences = text.split('. ') vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(sentences) sim_matrix = cosine_similarity(X) scores = sim_matrix.sum(axis=1) ranked_sentences = [sentences[i] for i in np.argsort(scores)[::-1]] summary = '. '.join(ranked_sentences[:num_sentences]) return summary sample_text = ( "Machine learning is a method of data analysis that automates analytical model building. " "It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. " "Because of new computing technologies, machine learning today is not like machine learning of the past. " "It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. " "Researchers interested in artificial intelligence wanted to see if computers could learn from data." ) summary = extractive_summary(sample_text, num_sentences=2) print(summary)
Extractive summarization keeps original sentences, so the summary is easy to read.
It may not always produce perfectly smooth summaries because it just picks sentences.
For better results, more advanced methods can be used, but this simple method works well for beginners.
Extractive summarization picks key sentences from text to make a short summary.
It uses techniques like TF-IDF and similarity to find important sentences.
This method keeps the original wording, making summaries easy to understand.