In this notebook we will cover how to map molecules in different ways and look at some of the things we can do with them. In the first part of the tutorial we will look at different ways to map water molecules and in the second, we will extend this to a more complex molecule system, the room temperature ionic liquid, BMIM-BF4.
In the first part of this demonstration we will load data from a GROMACS simulation and therefore, we need to define a set of units and a file reader object to use. For this reason, we have changed the imports a little bit to keep the code to minimum.
import mdsuite as mds
import mdsuite.file_io.chemfiles_read
from mdsuite.utils import Units
from zinchub import DataHub
import shutil
import h5py as hf
import numpy as np
2022-01-31 10:46:34.798887: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:34.798913: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-01-31 10:46:39.087811: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.087996: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.088161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.091381: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.091545: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.091700: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib: 2022-01-31 10:46:39.091718: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...
In this tutorial we are using 50 ns simulations of 14 water molecules in a continuum fluid performed with GROMACS. We will use pure atomistic naming as well as ligand naming, the topology files for which are contained on DataHub.
water = DataHub(url="https://github.com/zincware/DataHub/tree/main/Water_14_Gromacs")
water.get_file('./')
file_paths = [
f for f in water.file_raw
]
Here we create the project and define some custom units used by GROMACS.
project = mds.Project("Mapping_Molecules")
gmx_units = Units(
time=1e-15,
length=1e-10,
energy=1.6022e-19,
NkTV2p=1.6021765e6,
boltzmann=8.617343e-5,
temperature=1,
pressure=100000,
)
2022-01-31 10:46:40,563 - INFO: Creating new project Mapping_Molecules
In this section we take a look at how one can map molecules using SMILES strings.
traj_path = file_paths[2]
topol_path = file_paths[0]
file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
traj_file_path=traj_path, topol_file_path=topol_path
)
water_chemical = project.add_experiment(
name="water_chemical",
timestep=0.002,
temperature=300.0,
units=gmx_units,
simulation_data=file_reader,
)
water_chemical.run.CoordinateUnwrapper()
2022-01-31 10:46:40,727 - INFO: Creating a new experiment!
100%|███████████████████████████████████| 1/1 [00:00<00:00, 1.63it/s] 2022-01-31 10:46:41.649356: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Applying transformation 'Unwrapped_Positions' to 'O': 100%|█| 1/1 [00: Applying transformation 'Unwrapped_Positions' to 'H': 100%|█| 1/1 [00:
The first thing we need to do is define the molecule that will be mapped using the in-built MDSuite molecule data-class
chemical_water = mds.Molecule(
name='water',
smiles="[H]O[H]",
cutoff=1.7,
amount=14,
mol_pbc=True
)
water_chemical.run.MolecularMap(
molecules=[chemical_water]
)
2022-01-31 10:46:42,462 - INFO: Building molecular graph from configuration for water
100%|████████████████████████████████| 42/42 [00:00<00:00, 557.49it/s]
2022-01-31 10:46:42,680 - INFO: Performing molecule number isomorphism test. 2022-01-31 10:46:42,681 - INFO: Amount of molecules test passed. 2022-01-31 10:46:42,682 - INFO: Performing group equality isomorphism test. 2022-01-31 10:46:42,682 - INFO: Group equality isomorphism test passed. 2022-01-31 10:46:42,787 - INFO: Mapping molecule graphs onto trajectory for water
100%|███████████████████████████████████| 1/1 [00:00<00:00, 4.78it/s] Applying transformation 'Unwrapped_Positions' to 'water': 100%|█| 1/1
water_chemical.molecules
{'water': {'n_particles': 14, 'mass': 18.015, 'groups': {'0': {'H': [0, 1], 'O': [0]}, '1': {'H': [2, 3], 'O': [1]}, '2': {'H': [4, 5], 'O': [2]}, '3': {'H': [6, 7], 'O': [3]}, '4': {'H': [8, 9], 'O': [4]}, '5': {'H': [10, 11], 'O': [5]}, '6': {'H': [12, 13], 'O': [6]}, '7': {'H': [14, 15], 'O': [7]}, '8': {'H': [16, 17], 'O': [8]}, '9': {'H': [18, 19], 'O': [9]}, '10': {'H': [20, 21], 'O': [10]}, '11': {'H': [22, 23], 'O': [11]}, '12': {'H': [24, 25], 'O': [12]}, '13': {'H': [26, 27], 'O': [13]}}}}
If you do not have particles with chemical names but you nonetheless wish to construct groups out of particles, this can be achieved by using a reference dict.
In this example, we use the ligand naming from GROMACS to construct water molecules.
traj_path = file_paths[2]
topol_path = file_paths[1]
file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
traj_file_path=traj_path, topol_file_path=topol_path
)
water_ligand = project.add_experiment(
name="water_ligand",
timestep=0.002,
temperature=300.0,
units=gmx_units,
simulation_data=file_reader,
)
water_ligand.run.CoordinateUnwrapper()
2022-01-31 10:46:43,502 - INFO: Creating a new experiment! 2022-01-31 10:46:44,364 - WARNING: WARNING element OW has been assigned mass=0.0
100%|███████████████████████████████████| 1/1 [00:00<00:00, 1.50it/s] Applying transformation 'Unwrapped_Positions' to 'OW': 100%|█| 1/1 [00 Applying transformation 'Unwrapped_Positions' to 'HW1': 100%|█| 1/1 [0 Applying transformation 'Unwrapped_Positions' to 'HW2': 100%|█| 1/1 [0
Keep in mind, as the particles are not named from the periodic tables, important properties such as mass will need to be filled in manually.
water_ligand.species['OW'].mass = [15.999]
water_ligand.species['HW1'].mass = [1.00784]
water_ligand.species['HW2'].mass = [1.00784]
In this case, the molecule will be defined a little bit differently.
ligand_water = mds.Molecule(
name='water',
cutoff=1.7,
amount=14,
species_dict={"HW1": 1, "OW": 1, "HW2": 1},
mol_pbc=True
)
water_ligand.run.MolecularMap(
molecules=[ligand_water]
)
2022-01-31 10:46:47,651 - INFO: Building molecular graph from configuration for water
100%|████████████████████████████████| 42/42 [00:00<00:00, 548.19it/s]
2022-01-31 10:46:47,883 - INFO: Performing molecule number isomorphism test. 2022-01-31 10:46:47,885 - INFO: Amount of molecules test passed. 2022-01-31 10:46:47,886 - INFO: Performing group equality isomorphism test. 2022-01-31 10:46:47,886 - INFO: Group equality isomorphism test passed. 2022-01-31 10:46:47,984 - INFO: Mapping molecule graphs onto trajectory for water
100%|███████████████████████████████████| 1/1 [00:00<00:00, 5.09it/s] Applying transformation 'Unwrapped_Positions' to 'water': 100%|█| 1/1
water_ligand.molecules
{'water': {'n_particles': 14, 'mass': 18.014680000000002, 'groups': {'0': {'HW1': [0], 'OW': [0], 'HW2': [0]}, '1': {'HW1': [1], 'OW': [1], 'HW2': [1]}, '2': {'HW1': [2], 'OW': [2], 'HW2': [2]}, '3': {'HW1': [3], 'OW': [3], 'HW2': [3]}, '4': {'HW1': [4], 'OW': [4], 'HW2': [4]}, '5': {'HW1': [5], 'OW': [5], 'HW2': [5]}, '6': {'HW1': [6], 'OW': [6], 'HW2': [6]}, '7': {'HW1': [7], 'OW': [7], 'HW2': [7]}, '8': {'HW1': [8], 'OW': [8], 'HW2': [8]}, '9': {'HW1': [9], 'OW': [9], 'HW2': [9]}, '10': {'HW1': [10], 'OW': [10], 'HW2': [10]}, '11': {'HW1': [11], 'OW': [11], 'HW2': [11]}, '12': {'HW1': [12], 'OW': [12], 'HW2': [12]}, '13': {'HW1': [13], 'OW': [13], 'HW2': [13]}}}}
So the molecule mapping itself was quick and easy, but what information has been stored along the way?
All meta-data about the molecules is stored in the experiment class under molecules. Let's take a look at what this contains.
water_chemical.molecules.keys()
dict_keys(['water'])
This dict will contain all of the molecules that have been mapped, but this is not the information about the molecules, for that, we need to look at the water molecule.
water_chemical.molecules['water'].keys()
dict_keys(['n_particles', 'mass', 'groups'])
Three of these are fairly trivial and we can look at them quickly, groups will require some more attention.
print(f"n_particles: {water_chemical.molecules['water']['n_particles']}")
print(f"mass: {water_chemical.molecules['water']['mass']}")
n_particles: 14 mass: 18.015
Now let's take a look at the groups key.
print(water_chemical.molecules['water']['groups'])
{'0': {'H': [0, 1], 'O': [0]}, '1': {'H': [2, 3], 'O': [1]}, '2': {'H': [4, 5], 'O': [2]}, '3': {'H': [6, 7], 'O': [3]}, '4': {'H': [8, 9], 'O': [4]}, '5': {'H': [10, 11], 'O': [5]}, '6': {'H': [12, 13], 'O': [6]}, '7': {'H': [14, 15], 'O': [7]}, '8': {'H': [16, 17], 'O': [8]}, '9': {'H': [18, 19], 'O': [9]}, '10': {'H': [20, 21], 'O': [10]}, '11': {'H': [22, 23], 'O': [11]}, '12': {'H': [24, 25], 'O': [12]}, '13': {'H': [26, 27], 'O': [13]}}
The groups key contains direct information about which atoms belong to which molecule, for example, the 10th molecule of water (id=9) consists of Hydrogen atoms 18 and 19 and oxygen atom 10.
print(water_chemical.molecules['water']['groups']['9'])
{'H': [18, 19], 'O': [9]}
With this information you can compute values, e.g. diffusion coefficients with only the atoms belonging to a single molecule using the atom_select arguments in the calculator.
Now that we have seen how we can build molecules and what information this gives is, let's look at what we can analyse using them.
First things first, let's confirm we are working with water by looking at the angular distribution function of the atoms.
water_chemical.run.AngularDistributionFunction(
number_of_configurations=5000, number_of_bins=500, norm_power=8
)
0%| | 0/1 [00:00<?, ?it/s]2022-01-31 10:47:05.284391: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4877315008 exceeds 10% of free system memory. 100%|███████████████████████████████████| 1/1 [00:32<00:00, 32.20s/it]
Exp1_Angular_Distribution_Function_1
Looking at the O_H_H ADF in he top right we see a strong max peak at 109.591 degrees corresponding well with the bond angle of an SPCE model (109.47) as was used in the simulation. It is also worth noting that the oxygen triplet angle looks similar to that measured in QM and experimental studies.
When we want to study the molecular ADF we have two choices, we can either pass it as a species argument to the calculator if only one is desired, we we can call the calculator with the molecules=True
keyword as we will do here.
water_chemical.run.AngularDistributionFunction(
molecules=True, number_of_configurations=3000, number_of_bins=500, norm_power=8
)
In this case we have increased the norm power to suppress the noise floor and highlight only the most dominant peaks.
In the water molecule ADF it we do not see any clear stacking or structure suggesting there is not special organization of the molecules in these simulations.
Now let's look at the radial structure and distribution of particles in space of both the atomistic system and the molecules. This is where molecule mapping can be very helpful as often we are more interested in the positions of the molecules themselves and not necessarily those of the atoms.
water_chemical.run.RadialDistributionFunction(
number_of_configurations=4000, start=100, stop=5100, number_of_bins=500
)
In the case of the hydrogen-hydrogen and the oxygen-hydrogen we can see clear peaks where the bond distance is fixed. Using the cursor to hover over the points in the plot we can identify a bond distance between hydrogens of approximately 0.163 nm, in good agreement with experimental values. The oxygen-hydrogen bond sits around 0.09 nm, also in good agreement with experiment values.
water_ligand.run.RadialDistributionFunction(
number_of_configurations=5200, start=0, stop=5100, number_of_bins=500, molecules=True
)
Finally, let's start looking at the diffusion coefficients of the atoms and molecules.
water_chemical.run.EinsteinDiffusionCoefficients(data_range=500)
2022-01-31 10:47:26,425 - INFO: starting Einstein Diffusion Computation 2022-01-31 10:47:26,428 - INFO: starting Einstein Diffusion Computation
O: 100%|████████████████████████████████| 1/1 [00:09<00:00, 9.84s/it] H: 100%|████████████████████████████████| 1/1 [00:12<00:00, 12.56s/it]
Exp1_Einstein Self-Diffusion Coefficients_5
water_chemical.run.EinsteinDiffusionCoefficients(molecules=True, data_range=500)
Say we want to study a specific molecule. We only want the diffusion coefficients, ADFs, and RDFs of the atoms in that one molecule. This can be achieved with the MDSuite atom-selection command and is included here as a demonstration of the flexibility of the software.
First things first, let's select a molecule group to study, say the first water molecule.
water_group = water_chemical.molecules['water']['groups']['0']
print(water_group)
{'H': [0, 1], 'O': [0]}
water_chemical.run.RadialDistributionFunction(atom_selection={'H': [0, 1], 'O': [0]}, number_of_configurations=2517)
water_chemical.run.AngularDistributionFunction(atom_selection=water_group, number_of_configurations=100)
water_chemical.run.EinsteinDiffusionCoefficients(atom_selection=water_group, data_range=2500)
2022-01-31 10:48:01,289 - INFO: starting Einstein Diffusion Computation 2022-01-31 10:48:01,292 - INFO: starting Einstein Diffusion Computation
O: 100%|████████████████████████████████| 1/1 [00:06<00:00, 6.38s/it] H: 100%|████████████████████████████████| 1/1 [00:06<00:00, 6.10s/it]
Exp1_Einstein Self-Diffusion Coefficients_9
bmim_bf4 = DataHub(
url="https://github.com/zincware/DataHub/tree/main/Bmim_BF4"
)
bmim_bf4.get_file()
bmim_file = bmim_bf4.file_raw
project.add_experiment("bmim_bf4", simulation_data=bmim_file)
project.experiments.bmim_bf4.run.CoordinateUnwrapper()
2022-01-31 10:48:21,384 - INFO: Creating a new experiment!
100%|███████████████████████████████████| 1/1 [00:09<00:00, 9.19s/it] Applying transformation 'Unwrapped_Positions' to 'N': 100%|█| 1/1 [00: Applying transformation 'Unwrapped_Positions' to 'C': 100%|█| 1/1 [00: Applying transformation 'Unwrapped_Positions' to 'H': 100%|█| 1/1 [00: Applying transformation 'Unwrapped_Positions' to 'B': 100%|█| 1/1 [00: Applying transformation 'Unwrapped_Positions' to 'F': 100%|█| 1/1 [00:
bmim_molecule = mdsuite.Molecule(
name="bmim",
species_dict={"C": 8, "N": 2, "H": 15},
amount=50,
cutoff=1.9,
reference_configuration=100,
)
bf_molecule = mdsuite.Molecule(
name="bf4",
smiles="[B-](F)(F)(F)F",
amount=50,
cutoff=2.4,
reference_configuration=100,
)
project.experiments["bmim_bf4"].run.MolecularMap(
molecules=[bmim_molecule, bf_molecule]
)
2022-01-31 10:48:33,915 - INFO: Building molecular graph from configuration for bmim
100%|█████████████████████████████| 1250/1250 [00:15<00:00, 80.88it/s]
2022-01-31 10:49:19,499 - INFO: Performing molecule number isomorphism test. 2022-01-31 10:49:19,501 - INFO: Amount of molecules test passed. 2022-01-31 10:49:19,502 - INFO: Performing group equality isomorphism test. 2022-01-31 10:49:19,502 - INFO: Group equality isomorphism test passed. 2022-01-31 10:49:19,682 - INFO: Mapping molecule graphs onto trajectory for bmim
23it [00:08, 2.64it/s] Applying transformation 'Positions' to 'bmim': 100%|█| 1/1 [00:00<00:0
2022-01-31 10:49:28,935 - INFO: Building molecular graph from configuration for bf4
100%|██████████████████████████████| 250/250 [00:02<00:00, 114.48it/s]
2022-01-31 10:49:33,606 - INFO: Performing molecule number isomorphism test. 2022-01-31 10:49:33,607 - INFO: Amount of molecules test passed. 2022-01-31 10:49:33,608 - INFO: Performing group equality isomorphism test. 2022-01-31 10:49:33,610 - INFO: Group equality isomorphism test passed. 2022-01-31 10:49:33,726 - INFO: Mapping molecule graphs onto trajectory for bf4
100%|███████████████████████████████████| 1/1 [00:00<00:00, 7.68it/s] Applying transformation 'Positions' to 'bf4': 100%|█| 1/1 [00:00<00:00
project.experiments.bmim_bf4.run.RadialDistributionFunction(
number_of_configurations=300, number_of_bins=100
)