How to do text summarization python in nlp

NlpHow-ToBeginner · 3 min read

Text Summarization in Python for NLP: Simple Guide

You can do text summarization in Python using transformers library from Hugging Face, which provides pre-trained models like bart-large-cnn for summarizing text. Simply load the model and tokenizer, then pass your text to generate a concise summary.

📐

Syntax

To summarize text using Hugging Face Transformers, you need to:

Import pipeline from transformers.
Create a summarization pipeline with a pre-trained model like bart-large-cnn.
Call the pipeline with your input text to get the summary.

python

from transformers import pipeline

summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
print(summary[0]['summary_text'])

💻

Example

This example shows how to summarize a long paragraph using the bart-large-cnn model. It loads the model, summarizes the text, and prints the result.

python

from transformers import pipeline

text = ("Machine learning is a method of data analysis that automates analytical model building. "
        "It is a branch of artificial intelligence based on the idea that systems can learn from data, "
        "identify patterns and make decisions with minimal human intervention.")

summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

Output

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data.

⚠️

Common Pitfalls

Input length: Models have a maximum input length (usually 1024 tokens). Long texts must be shortened or split.
Output length: Setting max_length too low can cut off important info; too high may produce long summaries.
Model choice: Using a model not fine-tuned for summarization can give poor results.
Dependencies: Ensure transformers and torch are installed and updated.

python

from transformers import pipeline

# Wrong: no model specified, will raise error
# summarizer = pipeline('summarization')

# Right: specify a summarization model
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')

📊

Quick Reference

Summary of key parameters for pipeline('summarization'):

Parameter	Description	Example
model	Pre-trained model name or path	'facebook/bart-large-cnn'
max_length	Maximum length of the summary	50
min_length	Minimum length of the summary	25
do_sample	Whether to use sampling; False for deterministic output	False
text	Input text to summarize	'Your long text here'

✅

Key Takeaways

Use Hugging Face Transformers pipeline with a summarization model like 'facebook/bart-large-cnn' for easy text summarization.

Adjust max_length and min_length to control summary size and detail.

Split very long texts to avoid model input length limits.

Always specify the model when creating the summarization pipeline to avoid errors.

Keep transformers and torch libraries updated for best performance.