Examples of regular expressions

Using module re

Germain SALVATO-VALLVERDU ([email protected])

This notebook is about several practical examples of regular expression. Online regex tester and debugger : regex101

In [1]:
import re

Some notes

  • quantifier * : Between zero and unlimited times
  • quantifier + : Between one and unlimited times,
  • quantifier ? : Between zero and one

Simplest case

In [59]:
a = " SCF Done:  E(RB3LYP) =  -599.864175717     A.U. after   16 cycles"
if re.search("^\sSCF Done:", a):
    print("ok")
ok

Read floating numbers with a regular expression

In [40]:
a = "           CCl          1.70000   1.60000   1.60000   1.70000   1.80000\n"
b = "           D1         148.47288 140.38227\n"
c = "           D3        -116.60811-112.89609-109.06468-104.97240-100.75211\n"
d = "           D1         148.47288\n"
In [33]:
pattern = re.compile("(\d+\.\d+)")
res = pattern.search(b)
print(res.group(0))
148.47288
In [41]:
pattern = re.compile("\s*([+-]?\d+\.\d+)")
print("a ", pattern.findall(a))
print("b ", pattern.findall(b))
print("c ", pattern.findall(c))
print("d ", pattern.findall(d))
a  ['1.70000', '1.60000', '1.60000', '1.70000', '1.80000']
b  ['148.47288', '140.38227']
c  ['-116.60811', '-112.89609', '-109.06468', '-104.97240', '-100.75211']
d  ['148.47288']
In [64]:
eigentest = "     Eigenvalues --  -547.05077-547.86712-548.29237-548.49474-548.57146 "
pattern = re.compile("\s*([+-]?\d+\.\d+)")
pattern.findall(eigentest)
Out[64]:
['-547.05077', '-547.86712', '-548.29237', '-548.49474', '-548.57146']

Looking for a string

In [42]:
soup = """ Rotational constants (GHZ):    142.1344479      0.4743210      0.4743149
 Standard basis: 6-31+G(d) (6D, 7F)
 There are    67 symmetry adapted cartesian basis functions of A   symmetry.
 There are    67 symmetry adapted basis functions of A   symmetry.
    67 basis functions,   132 primitive gaussians,    67 cartesian basis functions
    18 alpha electrons       18 beta electrons
       nuclear repulsion energy        45.1237784897 Hartrees.
 NAtoms=    6 NActive=    6 NUniq=    6 SFac= 1.00D+00 NAtFMM=   60 NAOKFM=F Big=F
 Integral buffers will be    131072 words long.                                                          
 Raffenetti 2 integral format.
 Two-electron integral symmetry is turned on.
 One-electron integrals computed using PRISM.
 NBasis=    67 RedAO= T EigKep=  7.02D-03  NBF=    67
 NBsUse=    67 1.00D-06 EigRej= -1.00D+00 NBFU=    67
 Initial guess from the checkpoint file:  "/scratch/183547/Gau-16335.chk"
 B after Tr=     0.000000    0.000000    0.000000
         Rot=    1.000000    0.000065    0.000000    0.000022 Ang=   0.01 deg.
 ExpMin= 4.38D-02 ExpMax= 2.52D+04 ExpMxC= 3.78D+03 IAcc=2 IRadAn=         4 AccDes= 0.00D+00
 Harris functional with IExCor=  402 and IRadAn=       4 diagonalized for initial guess.
 HarFok:  IExCor=  402 AccDes= 0.00D+00 IRadAn=         4 IDoV= 1 UseB2=F ITyADJ=14
 ICtDFT=  3500011 ScaDFX=  1.000000  1.000000  1.000000  1.000000
 FoFCou: FMM=F IPFlag=           0 FMFlag=      100000 FMFlg1=           0
         NFxFlg=           0 DoJE=T BraDBF=F KetDBF=T FulRan=T
         wScrn=  0.000000 ICntrl=     500 IOpCl=  0 I1Cent=   200000004 NGrid=           0
         NMat0=    1 NMatS0=      1 NMatT0=    0 NMatD0=    1 NMtDS0=    0 NMtDT0=    0
 Petite list used in FoFCou.
 Keep R1 ints in memory in canonical form, NReq=3514379.
 Requested convergence on RMS density matrix=1.00D-08 within 128 cycles.
 Requested convergence on MAX density matrix=1.00D-06.
 Requested convergence on             energy=1.00D-06.
 No special actions if energy rises.
 EnCoef did     2 forward-backward iterations
 EnCoef did     2 forward-backward iterations
 EnCoef did   100 forward-backward iterations
 EnCoef did     2 forward-backward iterations
 SCF Done:  E(RB3LYP) =  -599.864175717     A.U. after   16 cycles
            NFock= 16  Conv=0.99D-09     -V/T= 2.0046
 Calling FoFJK, ICntrl=      2127 FMM=F ISym2X=0 I1Cent= 0 IOpClX= 0 NMat=1 NMatS=1 NMatT=0.
 ***** Axes restored to original set *****
 -------------------------------------------------------------------
 Center     Atomic                   Forces (Hartrees/Bohr)
 Number     Number              X              Y              Z
 -------------------------------------------------------------------
      1        6          -0.000049868    0.001084800   -0.000123630
      2        1           0.000003801    0.000176939    0.000127406
      3        1           0.000110364    0.000154941   -0.000056689
      4        1          -0.000090176    0.000143240   -0.000052491
      5        9          -0.000046616    0.003735651   -0.000031395
      6       17           0.000072495   -0.005295572    0.000136799
 -------------------------------------------------------------------"""
In [55]:
pattern = re.compile("^(\sSCF Done:).*([+-]\d+.\d+)")
for line in soup.split("\n"):
    if pattern.match(line):
        m = pattern.match(line)
        for i in range(3):
            print(i, m.group(i))
0  SCF Done:  E(RB3LYP) =  -599.864175717
1  SCF Done:
2 -599.864175717
In [56]:
scan = """ SCF Done:  E(RB3LYP) =  -548.021019862     A.U. after   19 cycles
            NFock= 19  Conv=0.27D-08     -V/T= 2.0040
 Scan completed.

 Summary of the potential surface scan:
   N      DSO         SCF    
 ----  ---------  -----------
    1     1.0000   -547.05077
    2     1.1000   -547.86712
    3     1.2000   -548.29237
    4     1.3000   -548.49474
    5     1.4000   -548.57146
    6     1.5000   -548.57912
    7     1.6000   -548.55048
    8     1.7000   -548.50429
    9     1.8000   -548.45102
   10     1.9000   -548.39653
   11     2.0000   -548.34390
   12     2.1000   -548.29464
   13     2.2000   -548.24935
   14     2.3000   -548.20822
   15     2.4000   -548.17114
   16     2.5000   -548.13792
   17     2.6000   -548.10833
   18     2.7000   -548.08209
   19     2.8000   -548.05895
   20     2.9000   -548.03866
   21     3.0000   -548.02102
 ----  ---------  -----------
"""
In [57]:
scan_patt = re.compile("^\sSummary of the potential surface scan:")
for line in scan.split("\n"):
    if scan_patt.match(line):
        print(line.strip())
    
Summary of the potential surface scan: