Bird
Raised Fist0
NlpProgramBeginner · 2 min read

NLP Program to Summarize Text Using Python

Use the Hugging Face transformers library with pipeline('summarization') to create a simple NLP program that summarizes text, for example: from transformers import pipeline; summarizer = pipeline('summarization'); summary = summarizer(text)[0]['summary_text'].
📋

Examples

InputMachine learning is a method of data analysis that automates analytical model building.
OutputMachine learning automates analytical model building.
InputNatural language processing enables computers to understand human language. It is widely used in chatbots, translation, and sentiment analysis.
OutputNatural language processing helps computers understand human language and is used in chatbots, translation, and sentiment analysis.
Input
Output
🧠

How to Think About It

To summarize text, first understand the main points by reading the full text. Then, pick or generate a shorter version that keeps the key ideas. Using an NLP model trained for summarization helps automate this by analyzing the text and producing a concise summary.
📐

Algorithm

1
Get the input text to summarize.
2
Load a pre-trained summarization model.
3
Pass the input text to the model to generate a summary.
4
Extract the summary text from the model output.
5
Return or print the summary.
💻

Code

python
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline('summarization')

# Input text
text = "Machine learning is a method of data analysis that automates analytical model building."

# Generate summary
summary = summarizer(text, max_length=30, min_length=5, do_sample=False)[0]['summary_text']

print(summary)
Output
Machine learning automates analytical model building.
🔍

Dry Run

Let's trace the example text through the summarization code.

1

Load summarization pipeline

summarizer is set to a model that can summarize text.

2

Input text

text = "Machine learning is a method of data analysis that automates analytical model building."

3

Generate summary

summarizer processes the text and returns [{'summary_text': 'Machine learning automates analytical model building.'}]

4

Extract summary

summary = 'Machine learning automates analytical model building.'

5

Print summary

Output: Machine learning automates analytical model building.

StepActionValue
1Load modelsummarizer pipeline ready
2Input textMachine learning is a method of data analysis that automates analytical model building.
3Model output[{'summary_text': 'Machine learning automates analytical model building.'}]
4Extract summaryMachine learning automates analytical model building.
5PrintMachine learning automates analytical model building.
💡

Why This Works

Step 1: Load summarization pipeline

The pipeline('summarization') loads a pre-trained model that knows how to shorten text while keeping meaning.

Step 2: Input text

We provide the full text we want to summarize as input to the model.

Step 3: Generate summary

The model processes the input and creates a shorter version that captures the main ideas.

Step 4: Output summary

We extract the summary text from the model's output and print it for the user.

🔄

Alternative Approaches

Extractive summarization with NLTK
python
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from collections import defaultdict

nltk.download('punkt')
nltk.download('stopwords')

text = "Machine learning is a method of data analysis that automates analytical model building."
sentences = sent_tokenize(text)
stop_words = set(stopwords.words('english'))
word_frequencies = defaultdict(int)

for word in word_tokenize(text.lower()):
    if word.isalpha() and word not in stop_words:
        word_frequencies[word] += 1

max_freq = max(word_frequencies.values())
for word in word_frequencies:
    word_frequencies[word] /= max_freq

sentence_scores = defaultdict(int)
for sent in sentences:
    for word in word_tokenize(sent.lower()):
        if word in word_frequencies:
            sentence_scores[sent] += word_frequencies[word]

summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:1]
summary = ' '.join(summary_sentences)
print(summary)
This method picks important sentences based on word frequency but may miss context compared to neural models.
Using GPT-3 API for summarization
python
import openai

openai.api_key = 'YOUR_API_KEY'

text = "Machine learning is a method of data analysis that automates analytical model building."
response = openai.Completion.create(
    engine='text-davinci-003',
    prompt=f'Summarize this: {text}',
    max_tokens=50
)
summary = response.choices[0].text.strip()
print(summary)
This uses a powerful language model via API but requires internet and API key.

Complexity: O(n) time, O(n) space

Time Complexity

The summarization model processes the input text once, so time grows linearly with text length.

Space Complexity

The model stores input and output text plus internal states, so space grows linearly with input size.

Which Approach is Fastest?

Extractive methods are faster but less accurate; transformer models like in Hugging Face are slower but produce better summaries.

ApproachTimeSpaceBest For
Transformer summarizationO(n)O(n)High-quality summaries, moderate text length
Extractive summarizationO(n)O(n)Fast summaries, simple use cases
API-based GPT-3 summarizationDepends on API latencyMinimal localVery high-quality, requires internet and API key
💡
Use max_length and min_length parameters to control summary size.
⚠️
Trying to summarize very short text often returns the original text without change.