0
0
PythonHow-ToBeginner · 3 min read

Extract Words from String Using Regex in Python: Simple Guide

Use the re.findall() function with the pattern \w+ to extract all words from a string in Python. This pattern matches sequences of word characters, returning them as a list.
📐

Syntax

The basic syntax to extract words using regex in Python is:

  • re.findall(pattern, string): Finds all substrings where the regex pattern matches and returns them as a list.
  • pattern = '\\w+': Matches one or more word characters (letters, digits, underscore).
  • string: The text you want to extract words from.
python
import re
words = re.findall(r'\w+', 'Your text here')
💻

Example

This example shows how to extract all words from a sentence using re.findall() with the pattern \w+. It prints the list of words found.

python
import re
text = "Hello, world! Let's extract words123 from this string."
words = re.findall(r'\w+', text)
print(words)
Output
['Hello', 'world', 'Let', 's', 'extract', 'words123', 'from', 'this', 'string']
⚠️

Common Pitfalls

One common mistake is using a pattern that does not match words correctly, such as \W+ which matches non-word characters and returns separators instead of words. Another is forgetting to use raw strings (prefix r) for regex patterns, which can cause unexpected escapes.

Also, \w+ includes digits and underscores, so if you want only letters, use [a-zA-Z]+ instead.

python
import re
text = "Hello, world! 123_test"

# Wrong: matches non-word characters (separators)
wrong = re.findall(r'\W+', text)

# Right: matches words
right = re.findall(r'\w+', text)

print('Wrong:', wrong)
print('Right:', right)
Output
Wrong: [', ', '! ', '_'] Right: ['Hello', 'world', '123_test']
📊

Quick Reference

Regex PatternDescriptionExample Match
\w+Matches one or more word characters (letters, digits, underscore)Hello123, test_word
[a-zA-Z]+Matches only letters (uppercase and lowercase)Hello, test
\W+Matches one or more non-word characters (spaces, punctuation), !
\b\w+\bMatches whole words using word boundariesHello, world

Key Takeaways

Use re.findall(r'\w+', string) to extract all words from a string in Python.
Always use raw strings (prefix r) for regex patterns to avoid escape issues.
\w+ matches letters, digits, and underscores; use [a-zA-Z]+ to match only letters.
Avoid using \W+ if you want to extract words, as it matches separators instead.
re.findall returns a list of all matching substrings, perfect for word extraction.