Mapping Molecules¶

In this notebook we will cover how to map molecules in different ways and look at some of the things we can do with them. In the first part of the tutorial we will look at different ways to map water molecules and in the second, we will extend this to a more complex molecule system, the room temperature ionic liquid, BMIM-BF4.

Water Molecules¶

In the first part of this demonstration we will load data from a GROMACS simulation and therefore, we need to define a set of units and a file reader object to use. For this reason, we have changed the imports a little bit to keep the code to minimum.

In [1]:

import mdsuite as mds
import mdsuite.file_io.chemfiles_read
from mdsuite.utils import Units

from zinchub import DataHub
import shutil

import h5py as hf
import numpy as np

2022-01-31 10:46:34.798887: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:34.798913: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-01-31 10:46:39.087811: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.087996: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.088161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.091381: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.091545: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.091700: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/opt/slurm/lib:
2022-01-31 10:46:39.091718: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

Loading the data¶

In this tutorial we are using 50 ns simulations of 14 water molecules in a continuum fluid performed with GROMACS. We will use pure atomistic naming as well as ligand naming, the topology files for which are contained on DataHub.

In [2]:

water = DataHub(url="https://github.com/zincware/DataHub/tree/main/Water_14_Gromacs")
water.get_file('./')
file_paths = [
        f for f in water.file_raw
    ]

Project definition¶

Here we create the project and define some custom units used by GROMACS.

In [3]:

project = mds.Project("Mapping_Molecules")

gmx_units = Units(
        time=1e-15,
        length=1e-10,
        energy=1.6022e-19,
        NkTV2p=1.6021765e6,
        boltzmann=8.617343e-5,
        temperature=1,
        pressure=100000,
    )

2022-01-31 10:46:40,563 - INFO: Creating new project Mapping_Molecules

Mapping molecules with SMILES¶

In this section we take a look at how one can map molecules using SMILES strings.

In [4]:

traj_path = file_paths[2]
topol_path = file_paths[0]

file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
    traj_file_path=traj_path, topol_file_path=topol_path
)

water_chemical = project.add_experiment(
    name="water_chemical",
    timestep=0.002,
    temperature=300.0,
    units=gmx_units,
    simulation_data=file_reader,
)
water_chemical.run.CoordinateUnwrapper()

2022-01-31 10:46:40,727 - INFO: Creating a new experiment!

100%|███████████████████████████████████| 1/1 [00:00<00:00,  1.63it/s]
2022-01-31 10:46:41.649356: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Applying transformation 'Unwrapped_Positions' to 'O': 100%|█| 1/1 [00:
Applying transformation 'Unwrapped_Positions' to 'H': 100%|█| 1/1 [00:

The first thing we need to do is define the molecule that will be mapped using the in-built MDSuite molecule data-class

In [5]:

chemical_water = mds.Molecule(
    name='water',
    smiles="[H]O[H]", 
    cutoff=1.7, 
    amount=14, 
    mol_pbc=True
)

In [6]:

water_chemical.run.MolecularMap(
    molecules=[chemical_water]
)

2022-01-31 10:46:42,462 - INFO: Building molecular graph from configuration for water

100%|████████████████████████████████| 42/42 [00:00<00:00, 557.49it/s]

2022-01-31 10:46:42,680 - INFO: Performing molecule number isomorphism test.
2022-01-31 10:46:42,681 - INFO: Amount of molecules test passed.
2022-01-31 10:46:42,682 - INFO: Performing group equality isomorphism test.
2022-01-31 10:46:42,682 - INFO: Group equality isomorphism test passed.
2022-01-31 10:46:42,787 - INFO: Mapping molecule graphs onto trajectory for water

100%|███████████████████████████████████| 1/1 [00:00<00:00,  4.78it/s]
Applying transformation 'Unwrapped_Positions' to 'water': 100%|█| 1/1

In [7]:

water_chemical.molecules

Out[7]:

{'water': {'n_particles': 14,
  'mass': 18.015,
  'groups': {'0': {'H': [0, 1], 'O': [0]},
   '1': {'H': [2, 3], 'O': [1]},
   '2': {'H': [4, 5], 'O': [2]},
   '3': {'H': [6, 7], 'O': [3]},
   '4': {'H': [8, 9], 'O': [4]},
   '5': {'H': [10, 11], 'O': [5]},
   '6': {'H': [12, 13], 'O': [6]},
   '7': {'H': [14, 15], 'O': [7]},
   '8': {'H': [16, 17], 'O': [8]},
   '9': {'H': [18, 19], 'O': [9]},
   '10': {'H': [20, 21], 'O': [10]},
   '11': {'H': [22, 23], 'O': [11]},
   '12': {'H': [24, 25], 'O': [12]},
   '13': {'H': [26, 27], 'O': [13]}}}}

Mapping Molecules with a reference dict¶

If you do not have particles with chemical names but you nonetheless wish to construct groups out of particles, this can be achieved by using a reference dict.

In this example, we use the ligand naming from GROMACS to construct water molecules.

In [8]:

traj_path = file_paths[2]
topol_path = file_paths[1]

file_reader = mdsuite.file_io.chemfiles_read.ChemfilesRead(
    traj_file_path=traj_path, topol_file_path=topol_path
)

water_ligand = project.add_experiment(
    name="water_ligand",
    timestep=0.002,
    temperature=300.0,
    units=gmx_units,
    simulation_data=file_reader,
)
water_ligand.run.CoordinateUnwrapper()

2022-01-31 10:46:43,502 - INFO: Creating a new experiment!
2022-01-31 10:46:44,364 - WARNING: WARNING element OW has been assigned mass=0.0

100%|███████████████████████████████████| 1/1 [00:00<00:00,  1.50it/s]
Applying transformation 'Unwrapped_Positions' to 'OW': 100%|█| 1/1 [00
Applying transformation 'Unwrapped_Positions' to 'HW1': 100%|█| 1/1 [0
Applying transformation 'Unwrapped_Positions' to 'HW2': 100%|█| 1/1 [0

Keep in mind, as the particles are not named from the periodic tables, important properties such as mass will need to be filled in manually.

In [9]:

water_ligand.species['OW'].mass = [15.999]
water_ligand.species['HW1'].mass = [1.00784]
water_ligand.species['HW2'].mass = [1.00784]

In this case, the molecule will be defined a little bit differently.

In [10]:

ligand_water = mds.Molecule(
    name='water', 
    cutoff=1.7, 
    amount=14, 
    species_dict={"HW1": 1, "OW": 1, "HW2": 1},
    mol_pbc=True
)

In [11]:

water_ligand.run.MolecularMap(
    molecules=[ligand_water]
)

2022-01-31 10:46:47,651 - INFO: Building molecular graph from configuration for water

100%|████████████████████████████████| 42/42 [00:00<00:00, 548.19it/s]

2022-01-31 10:46:47,883 - INFO: Performing molecule number isomorphism test.
2022-01-31 10:46:47,885 - INFO: Amount of molecules test passed.
2022-01-31 10:46:47,886 - INFO: Performing group equality isomorphism test.
2022-01-31 10:46:47,886 - INFO: Group equality isomorphism test passed.
2022-01-31 10:46:47,984 - INFO: Mapping molecule graphs onto trajectory for water

100%|███████████████████████████████████| 1/1 [00:00<00:00,  5.09it/s]
Applying transformation 'Unwrapped_Positions' to 'water': 100%|█| 1/1

In [12]:

water_ligand.molecules

Out[12]:

{'water': {'n_particles': 14,
  'mass': 18.014680000000002,
  'groups': {'0': {'HW1': [0], 'OW': [0], 'HW2': [0]},
   '1': {'HW1': [1], 'OW': [1], 'HW2': [1]},
   '2': {'HW1': [2], 'OW': [2], 'HW2': [2]},
   '3': {'HW1': [3], 'OW': [3], 'HW2': [3]},
   '4': {'HW1': [4], 'OW': [4], 'HW2': [4]},
   '5': {'HW1': [5], 'OW': [5], 'HW2': [5]},
   '6': {'HW1': [6], 'OW': [6], 'HW2': [6]},
   '7': {'HW1': [7], 'OW': [7], 'HW2': [7]},
   '8': {'HW1': [8], 'OW': [8], 'HW2': [8]},
   '9': {'HW1': [9], 'OW': [9], 'HW2': [9]},
   '10': {'HW1': [10], 'OW': [10], 'HW2': [10]},
   '11': {'HW1': [11], 'OW': [11], 'HW2': [11]},
   '12': {'HW1': [12], 'OW': [12], 'HW2': [12]},
   '13': {'HW1': [13], 'OW': [13], 'HW2': [13]}}}}

What information is stored?¶

So the molecule mapping itself was quick and easy, but what information has been stored along the way?

All meta-data about the molecules is stored in the experiment class under molecules. Let's take a look at what this contains.

In [13]:

water_chemical.molecules.keys()

Out[13]:

dict_keys(['water'])

This dict will contain all of the molecules that have been mapped, but this is not the information about the molecules, for that, we need to look at the water molecule.

In [14]:

water_chemical.molecules['water'].keys()

Out[14]:

dict_keys(['n_particles', 'mass', 'groups'])

Three of these are fairly trivial and we can look at them quickly, groups will require some more attention.

In [15]:

print(f"n_particles: {water_chemical.molecules['water']['n_particles']}")
print(f"mass: {water_chemical.molecules['water']['mass']}")

n_particles: 14
mass: 18.015

Now let's take a look at the groups key.

In [16]:

print(water_chemical.molecules['water']['groups'])

{'0': {'H': [0, 1], 'O': [0]}, '1': {'H': [2, 3], 'O': [1]}, '2': {'H': [4, 5], 'O': [2]}, '3': {'H': [6, 7], 'O': [3]}, '4': {'H': [8, 9], 'O': [4]}, '5': {'H': [10, 11], 'O': [5]}, '6': {'H': [12, 13], 'O': [6]}, '7': {'H': [14, 15], 'O': [7]}, '8': {'H': [16, 17], 'O': [8]}, '9': {'H': [18, 19], 'O': [9]}, '10': {'H': [20, 21], 'O': [10]}, '11': {'H': [22, 23], 'O': [11]}, '12': {'H': [24, 25], 'O': [12]}, '13': {'H': [26, 27], 'O': [13]}}

The groups key contains direct information about which atoms belong to which molecule, for example, the 10th molecule of water (id=9) consists of Hydrogen atoms 18 and 19 and oxygen atom 10.

In [17]:

print(water_chemical.molecules['water']['groups']['9'])

{'H': [18, 19], 'O': [9]}

With this information you can compute values, e.g. diffusion coefficients with only the atoms belonging to a single molecule using the atom_select arguments in the calculator.

Analysis with molecules¶

Now that we have seen how we can build molecules and what information this gives is, let's look at what we can analyse using them.

Angular Distribution Functions (ADFs)¶

First things first, let's confirm we are working with water by looking at the angular distribution function of the atoms.

In [18]:

water_chemical.run.AngularDistributionFunction(
    number_of_configurations=5000, number_of_bins=500, norm_power=8
)

  0%|                                           | 0/1 [00:00<?, ?it/s]2022-01-31 10:47:05.284391: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4877315008 exceeds 10% of free system memory.
100%|███████████████████████████████████| 1/1 [00:32<00:00, 32.20s/it]

Loading BokehJS ...

Out[18]:

Exp1_Angular_Distribution_Function_1

Looking at the O_H_H ADF in he top right we see a strong max peak at 109.591 degrees corresponding well with the bond angle of an SPCE model (109.47) as was used in the simulation. It is also worth noting that the oxygen triplet angle looks similar to that measured in QM and experimental studies.

When we want to study the molecular ADF we have two choices, we can either pass it as a species argument to the calculator if only one is desired, we we can call the calculator with the molecules=True keyword as we will do here.

In [19]:

water_chemical.run.AngularDistributionFunction(
    molecules=True, number_of_configurations=3000, number_of_bins=500, norm_power=8
)

100%|███████████████████████████████████| 1/1 [00:00<00:00,  1.04it/s]

Loading BokehJS ...

Out[19]:

Exp1_Angular_Distribution_Function_2

In this case we have increased the norm power to suppress the noise floor and highlight only the most dominant peaks.

In the water molecule ADF it we do not see any clear stacking or structure suggesting there is not special organization of the molecules in these simulations.

Radial Distribution Functions (RDFs)¶

Now let's look at the radial structure and distribution of particles in space of both the atomistic system and the molecules. This is where molecule mapping can be very helpful as often we are more interested in the positions of the molecules themselves and not necessarily those of the atoms.

In [20]:

water_chemical.run.RadialDistributionFunction(
    number_of_configurations=4000, start=100, stop=5100, number_of_bins=500
)

Running mini batch loop 1 / 1: 100%|████| 1/1 [00:01<00:00,  1.79s/it]

Loading BokehJS ...

Out[20]:

Exp1_Radial_Distribution_Function_3

In the case of the hydrogen-hydrogen and the oxygen-hydrogen we can see clear peaks where the bond distance is fixed. Using the cursor to hover over the points in the plot we can identify a bond distance between hydrogens of approximately 0.163 nm, in good agreement with experimental values. The oxygen-hydrogen bond sits around 0.09 nm, also in good agreement with experiment values.

In [21]:

water_ligand.run.RadialDistributionFunction(
    number_of_configurations=5200, start=0, stop=5100, number_of_bins=500, molecules=True
)

Running mini batch loop 1 / 1: 100%|████| 1/1 [00:00<00:00,  4.09it/s]

Loading BokehJS ...

Out[21]:

Exp2_Radial_Distribution_Function_4

Diffusion Coeffients¶

Finally, let's start looking at the diffusion coefficients of the atoms and molecules.

In [22]:

water_chemical.run.EinsteinDiffusionCoefficients(data_range=500)

2022-01-31 10:47:26,425 - INFO: starting Einstein Diffusion Computation
2022-01-31 10:47:26,428 - INFO: starting Einstein Diffusion Computation

O: 100%|████████████████████████████████| 1/1 [00:09<00:00,  9.84s/it]
H: 100%|████████████████████████████████| 1/1 [00:12<00:00, 12.56s/it]

Loading BokehJS ...

Out[22]:

Exp1_Einstein Self-Diffusion Coefficients_5

In [23]:

water_chemical.run.EinsteinDiffusionCoefficients(molecules=True, data_range=500)

2022-01-31 10:47:49,452 - INFO: starting Einstein Diffusion Computation
2022-01-31 10:47:49,453 - INFO: starting Einstein Diffusion Computation

water: 100%|████████████████████████████| 1/1 [00:09<00:00,  9.71s/it]

Loading BokehJS ...

Out[23]:

Exp1_Einstein Self-Diffusion Coefficients_6

Group-wise analysis¶

Say we want to study a specific molecule. We only want the diffusion coefficients, ADFs, and RDFs of the atoms in that one molecule. This can be achieved with the MDSuite atom-selection command and is included here as a demonstration of the flexibility of the software.

First things first, let's select a molecule group to study, say the first water molecule.

In [24]:

water_group = water_chemical.molecules['water']['groups']['0']
print(water_group)

{'H': [0, 1], 'O': [0]}

In [25]:

water_chemical.run.RadialDistributionFunction(atom_selection={'H': [0, 1], 'O': [0]}, number_of_configurations=2517)

Running mini batch loop 1 / 1: 100%|████| 1/1 [00:00<00:00,  4.04it/s]

Loading BokehJS ...

Out[25]:

Exp1_Radial_Distribution_Function_7

In [26]:

water_chemical.run.AngularDistributionFunction(atom_selection=water_group, number_of_configurations=100)

100%|███████████████████████████████████| 1/1 [00:00<00:00, 10.40it/s]

Loading BokehJS ...

Out[26]:

Exp1_Angular_Distribution_Function_8

In [27]:

water_chemical.run.EinsteinDiffusionCoefficients(atom_selection=water_group, data_range=2500)

2022-01-31 10:48:01,289 - INFO: starting Einstein Diffusion Computation
2022-01-31 10:48:01,292 - INFO: starting Einstein Diffusion Computation

O: 100%|████████████████████████████████| 1/1 [00:06<00:00,  6.38s/it]
H: 100%|████████████████████████████████| 1/1 [00:06<00:00,  6.10s/it]

Loading BokehJS ...

Out[27]:

Exp1_Einstein Self-Diffusion Coefficients_9

BMIM-BF4 RTIL¶

In [28]:

bmim_bf4 = DataHub(
    url="https://github.com/zincware/DataHub/tree/main/Bmim_BF4"
)
bmim_bf4.get_file()
bmim_file = bmim_bf4.file_raw

In [29]:

project.add_experiment("bmim_bf4", simulation_data=bmim_file)
project.experiments.bmim_bf4.run.CoordinateUnwrapper()

2022-01-31 10:48:21,384 - INFO: Creating a new experiment!

100%|███████████████████████████████████| 1/1 [00:09<00:00,  9.19s/it]
Applying transformation 'Unwrapped_Positions' to 'N': 100%|█| 1/1 [00:
Applying transformation 'Unwrapped_Positions' to 'C': 100%|█| 1/1 [00:
Applying transformation 'Unwrapped_Positions' to 'H': 100%|█| 1/1 [00:
Applying transformation 'Unwrapped_Positions' to 'B': 100%|█| 1/1 [00:
Applying transformation 'Unwrapped_Positions' to 'F': 100%|█| 1/1 [00:

In [30]:

bmim_molecule = mdsuite.Molecule(
            name="bmim",
            species_dict={"C": 8, "N": 2, "H": 15},
            amount=50,
            cutoff=1.9,
            reference_configuration=100,
        )
bf_molecule = mdsuite.Molecule(
    name="bf4",
    smiles="[B-](F)(F)(F)F",
    amount=50,
    cutoff=2.4,
    reference_configuration=100,
)
project.experiments["bmim_bf4"].run.MolecularMap(
    molecules=[bmim_molecule, bf_molecule]
)

2022-01-31 10:48:33,915 - INFO: Building molecular graph from configuration for bmim

100%|█████████████████████████████| 1250/1250 [00:15<00:00, 80.88it/s]

2022-01-31 10:49:19,499 - INFO: Performing molecule number isomorphism test.
2022-01-31 10:49:19,501 - INFO: Amount of molecules test passed.
2022-01-31 10:49:19,502 - INFO: Performing group equality isomorphism test.
2022-01-31 10:49:19,502 - INFO: Group equality isomorphism test passed.
2022-01-31 10:49:19,682 - INFO: Mapping molecule graphs onto trajectory for bmim

23it [00:08,  2.64it/s]                                               
Applying transformation 'Positions' to 'bmim': 100%|█| 1/1 [00:00<00:0

2022-01-31 10:49:28,935 - INFO: Building molecular graph from configuration for bf4

100%|██████████████████████████████| 250/250 [00:02<00:00, 114.48it/s]

2022-01-31 10:49:33,606 - INFO: Performing molecule number isomorphism test.
2022-01-31 10:49:33,607 - INFO: Amount of molecules test passed.
2022-01-31 10:49:33,608 - INFO: Performing group equality isomorphism test.
2022-01-31 10:49:33,610 - INFO: Group equality isomorphism test passed.
2022-01-31 10:49:33,726 - INFO: Mapping molecule graphs onto trajectory for bf4

100%|███████████████████████████████████| 1/1 [00:00<00:00,  7.68it/s]
Applying transformation 'Positions' to 'bf4': 100%|█| 1/1 [00:00<00:00

In [31]:

project.experiments.bmim_bf4.run.RadialDistributionFunction(
    number_of_configurations=300, number_of_bins=100
)

Running mini batch loop 1 / 2: 100%|████| 5/5 [00:03<00:00,  1.32it/s]
Running mini batch loop 2 / 2: 100%|████| 5/5 [00:03<00:00,  1.34it/s]

Loading BokehJS ...

Out[31]:

Exp3_Radial_Distribution_Function_10

In [32]:

project.experiments.bmim_bf4.run.RadialDistributionFunction(
    number_of_configurations=500, number_of_bins=100, molecules=True
)

Running mini batch loop 1 / 1: 100%|████| 1/1 [00:00<00:00, 13.05it/s]

Loading BokehJS ...

Out[32]:

Exp3_Radial_Distribution_Function_11