Developing a regex

  1. Think of the PATTERN you want to capture in general terms. "I want three letter words."
  2. Write pattern = "\w{3}" and then try it on a few practice strings. The goal is to BREAK your pattern, find out where it fails, and notice new parts of the pattern you missed.
In [1]:
import re
pattern = "\w{3}"
re.findall(pattern,"hey there guy") # whoops, "the" isnt a 3 letter word
Out[1]:
['hey', 'the', 'guy']
In [2]:
# tried but failed: 
#      "(\w{3}) "     <-- a space
#      "(\w{3})\b"    <-- a word boundary should work! why not?
pattern = r"(\w{3})\b" # trying that raw string notation thing 
re.findall(pattern,"hey there guy")  
# it made the `\b` work!, but pattern still it is failing...
Out[2]:
['hey', 'ere', 'guy']
In [3]:
pattern = r"\b(\w{3})\b"  # make sur the word has a boundary before it
re.findall(pattern,"hey there guy")  # got it!
Out[3]:
['hey', 'guy']