Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Extract Phone Number Using Regex in NLP

To extract phone numbers in NLP, use a regex pattern that matches common phone number formats like digits, spaces, dashes, or parentheses. Apply this pattern with Python's re module to find all phone numbers in your text data.
📐

Syntax

A regex pattern for phone numbers typically includes digits, optional spaces, dashes, parentheses, and country codes. Use re.findall(pattern, text) in Python to extract all matches.

  • \d: matches any digit (0-9)
  • [\s\-\(\)]: matches spaces, dashes, or parentheses
  • +: means one or more of the previous element
  • ?: means the previous element is optional
python
import re

pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}"
text = "Call me at +1 (123) 456-7890 or 987-654-3210."
phone_numbers = re.findall(pattern, text)
print(phone_numbers)
Output
['+1 (123) 456-7890', '987-654-3210']
💻

Example

This example shows how to extract phone numbers from a text string using Python's re module and a regex pattern that matches common US phone number formats.

python
import re

def extract_phone_numbers(text):
    pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}"
    return re.findall(pattern, text)

sample_text = "Contact us at +1 (800) 555-1234 or 415-555-9876 for support."
phones = extract_phone_numbers(sample_text)
print("Extracted phone numbers:", phones)
Output
Extracted phone numbers: ['+1 (800) 555-1234', '415-555-9876']
⚠️

Common Pitfalls

Common mistakes include using too strict or too loose regex patterns, which can miss valid numbers or capture wrong text. Also, forgetting to escape special characters like parentheses ( and ) causes errors. Another pitfall is not accounting for different phone formats or country codes.

Always test your regex on sample texts and adjust for your specific phone number formats.

python
import re

# Wrong pattern: missing escapes for parentheses
wrong_pattern = r"\+?\d?\s*(\d{3}\s*\d{3}\s*\d{4}"
text = "Call +1 (123) 456-7890"
try:
    print(re.findall(wrong_pattern, text))
except re.error as e:
    print(f"Regex error: {e}")

# Correct pattern with escaped parentheses
correct_pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}"
print(re.findall(correct_pattern, text))
Output
Regex error: missing ), unterminated subpattern at position 10 ['+1 (123) 456-7890']
📊

Quick Reference

Tips for extracting phone numbers with regex:

  • Use \d for digits and [\s\-\(\)] for separators.
  • Escape special characters like parentheses with \( and \).
  • Use +? and ? to handle optional parts like country codes.
  • Test regex on varied phone formats to ensure coverage.
  • Use Python's re.findall() to get all matches in text.

Key Takeaways

Use regex patterns with digits and optional separators to match phone numbers.
Escape special characters like parentheses in your regex pattern.
Test your regex on sample texts to avoid missing or wrong matches.
Use Python's re.findall() to extract all phone numbers from text.
Adjust your regex to fit the phone number formats relevant to your data.