How to Split String Using Regex in Python: Simple Guide
Use the
re.split() function from Python's re module to split a string by a regex pattern. Pass the regex pattern and the string to re.split(), and it returns a list of substrings split where the pattern matches.Syntax
The basic syntax to split a string using regex in Python is:
re.split(pattern, string, maxsplit=0, flags=0)
Where:
patternis the regex pattern to split on.stringis the input string to split.maxsplit(optional) limits the number of splits; 0 means no limit.flags(optional) modify regex behavior (like case-insensitive).
python
import re result = re.split(r'\W+', 'Hello, world! Welcome to Python.') print(result)
Output
['Hello', 'world', 'Welcome', 'to', 'Python']
Example
This example shows how to split a sentence into words by using any non-word character as the separator.
python
import re text = 'Hello, world! Welcome to Python.' words = re.split(r'\W+', text) print(words)
Output
['Hello', 'world', 'Welcome', 'to', 'Python']
Common Pitfalls
One common mistake is using str.split() when you need regex splitting, which only splits by fixed strings. Another is forgetting that re.split() can include empty strings if the pattern matches at the start or end.
Also, if your regex pattern contains capturing groups (parentheses), the matched separators are included in the result list.
python
import re # Wrong: using str.split() when regex needed text = 'apple, banana; orange' print(text.split(',')) # Only splits by comma # Right: using re.split() to split by comma or semicolon print(re.split(r'[;,]\s*', text)) # Capturing group example print(re.split(r'(,|;)', text)) # Includes separators in output
Output
['apple', ' banana; orange']
['apple', 'banana', 'orange']
['apple', ',', ' banana', ';', ' orange']
Quick Reference
Tips for using re.split():
- Use raw strings (prefix
r) for regex patterns to avoid escaping issues. - Set
maxsplitto limit splits if needed. - Use
flags=re.IGNORECASEfor case-insensitive splitting. - Remember capturing groups include separators in output.
Key Takeaways
Use
re.split() to split strings by regex patterns in Python.Pass the regex pattern and string to
re.split() to get a list of parts.Beware that capturing groups in the pattern include separators in the result.
Use raw strings (r'pattern') to write regex patterns safely.
Set
maxsplit to control how many splits happen.