Notebook slides
Regex tutorial You write regular expressions (regex) to match patterns in strings. When you are processing text, you may want to extract a substring of some predictable structure: a phone number, an email address, or something more specific to your research or task. You may also want to clean your text of some kind of junk: maybe there are repetitive formatting errors due to some transcription process that you need to remove. In these cases and in many others like them, writing the right regex will be better than working by hand or using a magical third-party library/software that claims to do what you want. Please refer back to the slides to see the building blocks of regex. Character classes Used to match any one of a specific set of characters Defined using the [ and ] metacharacters Within a character class, ^ and - can have special meaning (complement and range), depending on their position in the class In [28]: import re #the regex module in the python standard library #strings to be searched for matching regex patterns str1 = "Aardvarks belong to the Captain" str2 = "Albert's famous equation, E = mc^2." str3 = "Located at 455 Serra Mall." str4 = "Beware of the shape-shifters!" test_strings = [ str1 , str2 , str3 , str4 ] #created a list of strings