NlpProgramBeginner · 2 min read

NLP Program to Summarize Text Using Python

Use the Hugging Face transformers library with pipeline('summarization') to create a simple NLP program that summarizes text, for example: from transformers import pipeline; summarizer = pipeline('summarization'); summary = summarizer(text)[0]['summary_text'].

📋

Examples

InputMachine learning is a method of data analysis that automates analytical model building.

OutputMachine learning automates analytical model building.

InputNatural language processing enables computers to understand human language. It is widely used in chatbots, translation, and sentiment analysis.

OutputNatural language processing helps computers understand human language and is used in chatbots, translation, and sentiment analysis.

Input

Output

🧠

How to Think About It

To summarize text, first understand the main points by reading the full text. Then, pick or generate a shorter version that keeps the key ideas. Using an NLP model trained for summarization helps automate this by analyzing the text and producing a concise summary.

📐

Algorithm

Get the input text to summarize.

Load a pre-trained summarization model.

Pass the input text to the model to generate a summary.

Extract the summary text from the model output.

Return or print the summary.

💻

Code

python

from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline('summarization')

# Input text
text = "Machine learning is a method of data analysis that automates analytical model building."

# Generate summary
summary = summarizer(text, max_length=30, min_length=5, do_sample=False)[0]['summary_text']

print(summary)

Output

Machine learning automates analytical model building.

🔍

Dry Run

Let's trace the example text through the summarization code.

Load summarization pipeline

summarizer is set to a model that can summarize text.

Input text

text = "Machine learning is a method of data analysis that automates analytical model building."

Generate summary

summarizer processes the text and returns [{'summary_text': 'Machine learning automates analytical model building.'}]

Extract summary

summary = 'Machine learning automates analytical model building.'

Print summary

Output: Machine learning automates analytical model building.

Step	Action	Value
1	Load model	summarizer pipeline ready
2	Input text	Machine learning is a method of data analysis that automates analytical model building.
3	Model output	[{'summary_text': 'Machine learning automates analytical model building.'}]
4	Extract summary	Machine learning automates analytical model building.
5	Print	Machine learning automates analytical model building.

💡

Why This Works

Step 1: Load summarization pipeline

The pipeline('summarization') loads a pre-trained model that knows how to shorten text while keeping meaning.

Step 2: Input text

We provide the full text we want to summarize as input to the model.

Step 3: Generate summary

The model processes the input and creates a shorter version that captures the main ideas.

Step 4: Output summary

We extract the summary text from the model's output and print it for the user.

🔄

Alternative Approaches

Extractive summarization with NLTK

python

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from collections import defaultdict

nltk.download('punkt')
nltk.download('stopwords')

text = "Machine learning is a method of data analysis that automates analytical model building."
sentences = sent_tokenize(text)
stop_words = set(stopwords.words('english'))
word_frequencies = defaultdict(int)

for word in word_tokenize(text.lower()):
    if word.isalpha() and word not in stop_words:
        word_frequencies[word] += 1

max_freq = max(word_frequencies.values())
for word in word_frequencies:
    word_frequencies[word] /= max_freq

sentence_scores = defaultdict(int)
for sent in sentences:
    for word in word_tokenize(sent.lower()):
        if word in word_frequencies:
            sentence_scores[sent] += word_frequencies[word]

summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:1]
summary = ' '.join(summary_sentences)
print(summary)

This method picks important sentences based on word frequency but may miss context compared to neural models.

Using GPT-3 API for summarization

python

import openai

openai.api_key = 'YOUR_API_KEY'

text = "Machine learning is a method of data analysis that automates analytical model building."
response = openai.Completion.create(
    engine='text-davinci-003',
    prompt=f'Summarize this: {text}',
    max_tokens=50
)
summary = response.choices[0].text.strip()
print(summary)

This uses a powerful language model via API but requires internet and API key.

⚡

Complexity: O(n) time, O(n) space

Time Complexity

The summarization model processes the input text once, so time grows linearly with text length.

Space Complexity

The model stores input and output text plus internal states, so space grows linearly with input size.

Which Approach is Fastest?

Extractive methods are faster but less accurate; transformer models like in Hugging Face are slower but produce better summaries.

Approach	Time	Space	Best For
Transformer summarization	O(n)	O(n)	High-quality summaries, moderate text length
Extractive summarization	O(n)	O(n)	Fast summaries, simple use cases
API-based GPT-3 summarization	Depends on API latency	Minimal local	Very high-quality, requires internet and API key

💡

Use max_length and min_length parameters to control summary size.

⚠️

Trying to summarize very short text often returns the original text without change.