Bird
Raised Fist0
NlpHow-ToBeginner ยท 3 min read

How to Remove Numbers from Text in NLP: Simple Methods

To remove numbers from text in NLP, use regular expressions (regex) with patterns like \d+ to find digits and replace them with empty strings. This cleans text data by eliminating all numeric characters efficiently.
๐Ÿ“

Syntax

Use Python's re.sub() function with a regex pattern to remove numbers from text.

  • re.sub(pattern, replacement, text): replaces parts of text matching pattern with replacement.
  • pattern = "\\d+": matches one or more digits.
  • replacement = "": replaces matched digits with nothing (removes them).
python
import re

text = "I have 2 apples and 10 bananas."
clean_text = re.sub(r"\d+", "", text)
print(clean_text)
Output
I have apples and bananas.
๐Ÿ’ป

Example

This example shows how to remove all numbers from a sentence using regex in Python. It demonstrates cleaning text by deleting digits while keeping other characters intact.

python
import re

def remove_numbers(text: str) -> str:
    return re.sub(r"\d+", "", text)

sample_text = "My phone number is 1234567890 and I was born in 1990."
result = remove_numbers(sample_text)
print(result)
Output
My phone number is and I was born in .
โš ๏ธ

Common Pitfalls

Common mistakes when removing numbers include:

  • Using incorrect regex patterns that do not match all digits (e.g., missing escape characters).
  • Removing numbers but leaving extra spaces, which can make text messy.
  • Removing numbers without considering decimal points or numbers inside words.

Always test your regex and clean extra spaces if needed.

python
import re

# Wrong way: missing escape for \d
text = "Price is 50 dollars"
wrong = re.sub(r"d+", "", text)  # Does not remove digits

# Right way:
right = re.sub(r"\d+", "", text)
print(f"Wrong: {wrong}")
print(f"Right: {right.strip()}")
Output
Wrong: Price is 50 dollars Right: Price is dollars
๐Ÿ“Š

Quick Reference

Tips for removing numbers from text in NLP:

  • Use re.sub(r"\d+", "", text) to remove digits.
  • Use str.strip() or re.sub(r"\s+", " ", text) to clean extra spaces after removal.
  • Consider if you need to remove decimal numbers or numbers inside words and adjust regex accordingly.
โœ…

Key Takeaways

Use regex pattern \d+ with re.sub() to remove all numbers from text.
Always test your regex to avoid missing digits or removing wrong parts.
Clean extra spaces after number removal for neat text.
Adjust regex if you need to handle decimals or embedded numbers.
Removing numbers helps prepare text for better NLP analysis.