How to Extract Phone Number Using Regex in NLP
regex pattern that matches common phone number formats like digits, spaces, dashes, or parentheses. Apply this pattern with Python's re module to find all phone numbers in your text data.Syntax
A regex pattern for phone numbers typically includes digits, optional spaces, dashes, parentheses, and country codes. Use re.findall(pattern, text) in Python to extract all matches.
\d: matches any digit (0-9)[\s\-\(\)]: matches spaces, dashes, or parentheses+: means one or more of the previous element?: means the previous element is optional
import re pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}" text = "Call me at +1 (123) 456-7890 or 987-654-3210." phone_numbers = re.findall(pattern, text) print(phone_numbers)
Example
This example shows how to extract phone numbers from a text string using Python's re module and a regex pattern that matches common US phone number formats.
import re def extract_phone_numbers(text): pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}" return re.findall(pattern, text) sample_text = "Contact us at +1 (800) 555-1234 or 415-555-9876 for support." phones = extract_phone_numbers(sample_text) print("Extracted phone numbers:", phones)
Common Pitfalls
Common mistakes include using too strict or too loose regex patterns, which can miss valid numbers or capture wrong text. Also, forgetting to escape special characters like parentheses ( and ) causes errors. Another pitfall is not accounting for different phone formats or country codes.
Always test your regex on sample texts and adjust for your specific phone number formats.
import re # Wrong pattern: missing escapes for parentheses wrong_pattern = r"\+?\d?\s*(\d{3}\s*\d{3}\s*\d{4}" text = "Call +1 (123) 456-7890" try: print(re.findall(wrong_pattern, text)) except re.error as e: print(f"Regex error: {e}") # Correct pattern with escaped parentheses correct_pattern = r"\+?\d?[\s\-\(\)]*\d{3}[\s\-\(\)]*\d{3}[\s\-]*\d{4}" print(re.findall(correct_pattern, text))
Quick Reference
Tips for extracting phone numbers with regex:
- Use
\dfor digits and[\s\-\(\)]for separators. - Escape special characters like parentheses with
\(and\). - Use
+?and?to handle optional parts like country codes. - Test regex on varied phone formats to ensure coverage.
- Use Python's
re.findall()to get all matches in text.
