Extract Words from String Using Regex in Python: Simple Guide
Use the
re.findall() function with the pattern \w+ to extract all words from a string in Python. This pattern matches sequences of word characters, returning them as a list.Syntax
The basic syntax to extract words using regex in Python is:
re.findall(pattern, string): Finds all substrings where the regexpatternmatches and returns them as a list.pattern = '\\w+': Matches one or more word characters (letters, digits, underscore).string: The text you want to extract words from.
python
import re words = re.findall(r'\w+', 'Your text here')
Example
This example shows how to extract all words from a sentence using re.findall() with the pattern \w+. It prints the list of words found.
python
import re text = "Hello, world! Let's extract words123 from this string." words = re.findall(r'\w+', text) print(words)
Output
['Hello', 'world', 'Let', 's', 'extract', 'words123', 'from', 'this', 'string']
Common Pitfalls
One common mistake is using a pattern that does not match words correctly, such as \W+ which matches non-word characters and returns separators instead of words. Another is forgetting to use raw strings (prefix r) for regex patterns, which can cause unexpected escapes.
Also, \w+ includes digits and underscores, so if you want only letters, use [a-zA-Z]+ instead.
python
import re text = "Hello, world! 123_test" # Wrong: matches non-word characters (separators) wrong = re.findall(r'\W+', text) # Right: matches words right = re.findall(r'\w+', text) print('Wrong:', wrong) print('Right:', right)
Output
Wrong: [', ', '! ', '_']
Right: ['Hello', 'world', '123_test']
Quick Reference
| Regex Pattern | Description | Example Match |
|---|---|---|
| \w+ | Matches one or more word characters (letters, digits, underscore) | Hello123, test_word |
| [a-zA-Z]+ | Matches only letters (uppercase and lowercase) | Hello, test |
| \W+ | Matches one or more non-word characters (spaces, punctuation) | , ! |
| \b\w+\b | Matches whole words using word boundaries | Hello, world |
Key Takeaways
Use re.findall(r'\w+', string) to extract all words from a string in Python.
Always use raw strings (prefix r) for regex patterns to avoid escape issues.
\w+ matches letters, digits, and underscores; use [a-zA-Z]+ to match only letters.
Avoid using \W+ if you want to extract words, as it matches separators instead.
re.findall returns a list of all matching substrings, perfect for word extraction.