Uranium-oxygen bond length analysis

authors: Evgeny Blokhin, Joseph Montoya

note: This notebook requires use of the MPDS data retrieval tool, which requires an account to the MPDS database. Please inquire about access to this database at the MPDS website.

Note that in order to get the in-line plotting to work, you might need to start Jupyter notebook with a higher data rate, e.g., jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10. We recommend you do this before starting.

This notebook was last updated 11/15/18 for version 0.4.5 of matminer.

We use the MPDSDataRetrieval tool to download the crystalline structures from the MPDS database. Let's say we want to study the chemical bond of uranium and oxygen. What is the length of this bond the most frequently reported in the world's scientific literature? The MPDS contains many crystalline structures with uranium and oxygen, so let's perform a quick data investigation to answer this question:

In [1]:
# Enter your MPDS API key below!
API_KEY = None
In [2]:
import pandas as pd
import numpy as np
import tqdm
from matminer.data_retrieval.retrieve_MPDS import MPDSDataRetrieval
from matminer.figrecipes.plot import PlotlyFig

Here we use a naive brute-force approach to calculate bond lengths between particular atom types in a crystalline environment. Obviously, we are interested in neighboring atoms only, so we do not consider interatomic distances more than, let's say, 4 Angstroms. We then represent a crystalline structure with the ase's Atoms class and calculate distances using its get_distance method. Note rounding of distances marked by comment NB in the code:

In [3]:
def calculate_lengths(ase_obj, elA, elB, limit=4):
    Short helper function to get bond lengths between element A
    and element B.
    assert elA != elB
    lengths = []
    all_lengths = ase_obj.get_all_distances()
    for n, atom in enumerate(ase_obj):
        if atom.symbol == elA:
            for m, neighbor in enumerate(ase_obj):
                if neighbor.symbol == elB:
                    dist = round(all_lengths[n][m], 2) # NB occurrence <-> rounding
                    if dist < limit:
    return lengths

Note that the crystalline structures are not retrieved from the MPDS by default, so we need to specify additional four fields:

  • cell_abc
  • sg_n
  • basis_noneq
  • els_noneq

On top of that, we also obtain crystalline phase_ids, MPDS entry numbers, and chemical formulae. Note that get_data API client method returns a usual Python list, whereas get_dataframe API client method returns a Pandas dataframe. We use the former below:

In [4]:
client = MPDSDataRetrieval(api_key=API_KEY)

answer = client.get_data(criteria={"elements": "U-O",
                                 "props": "atomic structure", "classes": "binary"},
                         fields={'S':['phase_id', 'entry', 'chemical_formula', 'cell_abc', 
                                      'sg_n', 'basis_noneq', 'els_noneq']})
Got 172 hits

MPDSDataRetrieval.compile_crystal API client method helps us to handle the crystalline structure in the ase's Atoms flavor. We then call calculate_lengths function defined earlier.

In [5]:
lengths = []

for item in tqdm.tqdm(answer):
    crystal = MPDSDataRetrieval.compile_crystal(item, 'ase')
    if not crystal: continue
    lengths.extend(calculate_lengths(crystal, 'U', 'O'))
100%|██████████| 172/172 [00:24<00:00,  6.96it/s]

That runs a little bit slow, since ase's Atoms are expectedly not performing very well on hundreds of bond length calculations. We may want to use the ase's neighbor_list method, or employ a C-extension here, but this is outside the scope of this exercise. A popular Pymatgen library can be also used instead of ase. (Mind however that Pymatgen and ase are generally incompatible.) Anyway now we have a flat list lengths. Let's convert it into a Pandas Dataframe and find which U-O distances occur more often than the others.

In [6]:
dfrm = pd.DataFrame(sorted(lengths), columns=['length'])
dfrm['occurrence'] = dfrm.groupby('length')['length'].transform('count')
dfrm.drop_duplicates('length', inplace=True)

What did we do here? We calculated the numbers of occurrences (counts) of each particular U-O length and then updated our dataframe dfrm with this info, creating a new column occurrence. Here is a resulting distribution of bond lengths rendered using matminer.figrecipes.plot.PlotlyFig.

We can see below that the most frequent bond lengths between the neighboring uranium and oxygen atoms are 1.78 and 2.35 Angstroms. This agrees with the well-known study of Burns et al. [1], done in 1997. However, Burns considered only 105 structures, and we did more than 170, confirming even more thoroughly his findings on the uranyl ion geometry.

In [7]:
pf = PlotlyFig(dfrm, mode='notebook', x_title="Bond lengths (A)")
pf.histogram(cols=['length'], n_bins=50)