#!/usr/bin/env python # coding: utf-8 # # Splitting 'ref' into book, chapter, verse and word # # The XML source data contains a tag called 'ref' which contains information related to book, chapter, verse and word. See following sniplet: # # ``` # ἐκλείσθη # ``` # # This small Jupyter Notebook explains how this compound variable 'ref' is split into 4 values: # In[1]: import re input="MAT 25:10!18" x= re.sub(r'[!: ]'," ", input).split() print (x) # ## Explanation of the code: # # The code begins by importing the regular expression module re to work with regular expressions. # # * The variable input is initialized with the string "MAT 1:1!1". This string contains various punctuation marks and spaces. # # * The re.sub() function is used to substitute certain characters in the input string with a space character (" "). The regular expression pattern [!: ] matches any occurrence of either a colon (:), an exclamation mark (!), or a space character. The matched characters are replaced with a space. # # * The result of the substitution operation is assigned to the variable x. This variable now holds the modified string where the matched characters have been replaced with spaces. # # * The split() method is then called on the modified string x. This method splits the string into a list of substrings based on whitespace. Since no specific delimiter is provided to the split() method, it uses whitespace (spaces) as the default delimiter. # # * The resulting list, containing the substrings after the split operation, is printed using the print() function. # The following site can be used to build and verify a regular expression: [regex101.com](https://regex101.com/) (choose the 'Pyton flavor') # In[ ]: