from pyexpat import model
from moldrug import utils
from rdkit import Chem
from rdkit.Chem import AllChem, Lipinski, Draw
import multiprocessing as mp
import shutil, os, gzip, requests, tempfile
tmp_path = tempfile.TemporaryDirectory()
wd = tmp_path.name
# or
# os.makedirs('tutorial_mwr', exist_ok=True)
# wd = os.path.abspath('tutorial_mwr')
url = "http://www.qsar4u.com/files/cremdb/replacements02_sc2.db.gz"
r = requests.get(url, allow_redirects=True)
crem_dbgz_path = os.path.join(wd,'crem.db.gz')
crem_db_path = os.path.join(wd,'crem.db')
open(crem_dbgz_path, 'wb').write(r.content)
with gzip.open(crem_dbgz_path, 'rb') as f_in:
with open(crem_db_path, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
Let's assume that our QSAR model give us that the best molecules are those ones that have more hydrogen bonds acceptor and donors. Because MolDrug minimize we must use the negative of this number.
def QSAR_model_cost(Individual:utils.Individual) -> utils.Individual:
"""Simple cost function
Parameters
----------
Individual : utils.Individual
An initialize Individual
Returns
-------
utils.Individual
The individual with cost sum of HBA and HBD with negative sign
"""
NumHAcceptors = Lipinski.NumHAcceptors(Individual.mol)
NumHDonors = Lipinski.NumHDonors(Individual.mol)
model = NumHAcceptors + NumHDonors
Individual.cost = -model
return Individual
The next code is a duplication for the implementation of QSAR_model_cost
fitness function.
Is written in this way due to some possible problems that might arise because the execution of multiprocessing inside GA
on an interactive Python like IPython-Notebook.
This will create an importable function in your working directory. In case you execute this code in a python script rather that in a Notebook, consider to put the execution of the class after the section if __name__ == '__main__:'
function_string = """
from rdkit.Chem import Lipinski
def QSAR_model_cost(Individual):
NumHAcceptors = Lipinski.NumHAcceptors(Individual.mol)
NumHDonors = Lipinski.NumHDonors(Individual.mol)
model = NumHAcceptors + NumHDonors
Individual.cost = -model
return Individual
"""
with open('my_cost.py', 'w') as f:
f.write(function_string)
from my_cost import QSAR_model_cost
ga = utils.GA(
Chem.MolFromSmiles('CCCC'),
crem_db_path = os.path.join(tmp_path.name, 'crem.db'),
maxiter = 10,
popsize = 50,
costfunc=QSAR_model_cost,
mutate_crem_kwargs={
'ncores': mp.cpu_count()
},
costfunc_kwargs={}
)
ga(12)
Creating the first population with 50 members:
100%|██████████| 50/50 [00:01<00:00, 43.08it/s]
Initial Population: Best Individual: Individual(idx = 40, smiles = CN(C)N(O)N=O, cost = -5) Accepted rate: 50 / 50
Evaluating generation 1 / 10:
100%|██████████| 48/48 [00:01<00:00, 34.57it/s]
Generation 1: Best Individual: Individual(idx = 63, smiles = O=P(O)(O)CCP(=O)(O)O, cost = -6). Accepted rate: 28 / 48
Evaluating generation 2 / 10:
100%|██████████| 45/45 [00:01<00:00, 34.59it/s]
Generation 2: Best Individual: Individual(idx = 63, smiles = O=P(O)(O)CCP(=O)(O)O, cost = -6). Accepted rate: 18 / 45
Evaluating generation 3 / 10:
100%|██████████| 48/48 [00:01<00:00, 41.87it/s]
Generation 3: Best Individual: Individual(idx = 175, smiles = N=C(N)NN=CC(=O)O, cost = -7). Accepted rate: 19 / 48
Note: The mutation on Individual(idx = 145, smiles = Nc1nc2ccccc2[n+]([O-])n1, cost = -5) did not work, it will be returned the same individual Evaluating generation 4 / 10:
100%|██████████| 48/48 [00:01<00:00, 37.42it/s]
Generation 4: Best Individual: Individual(idx = 200, smiles = COc1nc(N)nc(OC)c1N=O, cost = -8). Accepted rate: 12 / 48
Note: The mutation on Individual(idx = 145, smiles = Nc1nc2ccccc2[n+]([O-])n1, cost = -5) did not work, it will be returned the same individual Note: The mutation on Individual(idx = 145, smiles = Nc1nc2ccccc2[n+]([O-])n1, cost = -5) did not work, it will be returned the same individual Evaluating generation 5 / 10:
100%|██████████| 46/46 [00:01<00:00, 31.64it/s]
Generation 5: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 12 / 46
Note: The mutation on Individual(idx = 145, smiles = Nc1nc2ccccc2[n+]([O-])n1, cost = -5) did not work, it will be returned the same individual Evaluating generation 6 / 10:
100%|██████████| 48/48 [00:01<00:00, 33.12it/s]
Generation 6: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 14 / 48 Evaluating generation 7 / 10:
100%|██████████| 49/49 [00:01<00:00, 31.17it/s]
Generation 7: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 7 / 49
Evaluating generation 8 / 10:
100%|██████████| 49/49 [00:01<00:00, 29.06it/s]
Generation 8: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 13 / 49
Evaluating generation 9 / 10:
100%|██████████| 50/50 [00:01<00:00, 30.44it/s]
Generation 9: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 13 / 50
Evaluating generation 10 / 10:
100%|██████████| 49/49 [00:01<00:00, 26.09it/s]
Generation 10: Best Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12). Accepted rate: 13 / 49 =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The simulation finished successfully after 10 generations with a population of 50 individuals. A total number of 529 Individuals were seen during the simulation. Initial Individual: Individual(idx = 0, smiles = CCCC, cost = 0) Final Individual: Individual(idx = 275, smiles = NC(C(=O)O)C(O)C(O)C(O)CO, cost = -12) The cost function dropped in 12 units. =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Total time (10 generations): 480.95 (s). Finished at Mon Sep 4 19:54:16 2023.
Draw.MolsToGridImage([ga.InitIndividual.mol, ga.pop[0].mol])
As you can see, this simulation is really good: On every single carbon atom it is bound, at least, one HBD and/or HBA! (Your results might differ)
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=3, figsize = (16,9))
ax[0].plot([individual.cost for individual in ga.SawIndividuals], '-o')
ax[0].set(title = 'cost')
ax[1].plot([cost for cost in ga.avg_cost], '-o')
ax[1].set(title = 'avg_cost')
ax[2].plot([cost for cost in ga.best_cost], '-o')
ax[2].set(title = 'best_cost')
[Text(0.5, 1.0, 'best_cost')]