Basic MPDS API usage: machine-learning and peer-reviewed data

  • Complexity level: beginner
  • Requirements: understanding how APIs work

Let's play a bit with the MPDS API, fetching different kinds of data?

Important! Before you proceed: the notebooks running at the third-party servers are not secure. Using this notebook assumes you authenticate at the MPDS server with your own API key. Please run this notebook only if you have an open-access account (i.e. an access section of your MPDS account reads: Programmatic data access: only open data).

Please do not run this notebook at the third-party servers if you have an elevated API access to the MPDS, since there's a nonzero probability of key leakage!

Be sure to always invalidate (revoke) your API key at your MPDS account after using the notebooks.

Now let's proceed with the authentication part. First, apply for an MPDS account, if you have none. Then copy your API key, run the next cell, paste the key in the appeared prompt input, and hit Enter.

In [ ]:
import os, getpass
os.environ['MPDS_KEY'] = getpass.getpass()

OK, now you may talk to the MPDS server programmatically from this notebook on your behalf.

In [ ]:
!pip install mpds_client
In [ ]:
from mpds_client import MPDSDataRetrieval, MPDSDataTypes, APIError
In [ ]:
[x for x in dir(MPDSDataTypes) if not x.startswith('__')]

The peer-reviewed data type is (and will be) default.

In [ ]:
example_props = [ # NB these props support machine-learning data type
'isothermal bulk modulus',
'enthalpy of formation',
'heat capacity at constant pressure',
'Seebeck coefficient',
'values of electronic band gap', # NB both direct + indirect gaps
'temperature for congruent melting',
'Debye temperature',
'linear thermal expansion coefficient'

Let's customize the returned data fields (that's optional):

In [ ]:
desired_fields = {
    'P':[ # *P*hysical property entries
    'S':[ # Crystalline *S*tructure entries
    'C':[ # Phase diagrams, i.e. *C*onstitution entries
        lambda: 'MANY-PHASE', # constants are given like this (on purpose)
        lambda: 'MANY-FORMULAE'
    # NB. P-S-C are interconnected by means of the distinct phases

Note, if the key isn't valid, the API returns an HTTP error 403.

In [ ]:
client = MPDSDataRetrieval(dtype=MPDSDataTypes.MACHINE_LEARNING)
In [ ]:
for prop in example_props:

    print("*" * 100)
    print("Considering %s" % prop)

        for card in client.get_data({
            "props": prop,
            # we defined our props above

            "classes": "transitional, superconductor",
            # a transitional metal atom must be present,
            # and a superconductor must be assigned in the original publication

            "aetypes": "all 7-vertex",
            # atomic environment type e.g. hexagonal pyramid, pentagonal bipyramid etc.

            "aeatoms": "X-S",
            # atomic environment atoms: any atom in the center, sulphur in the vertices (ligands)

            "years": "2010-2019"
            # only recent results (void for MACHINE_LEARNING, as all are 2018)
        }, fields=desired_fields):

            print("%s %s %s" % (card[0], "-".join(card[2]), card[3]))

    except APIError as ex:

        if ex.code == 1:
            print("No matches.")

            print("Error %s: %s" % (ex.code, ex.msg))
In [ ]:
client.dtype = MPDSDataTypes.PEER_REVIEWED

print(client.get_data({"elements": "O", "classes": "binary", "sgs": "I4/mmm"}))
In [ ]:
import random
prop = random.choice(example_props)

print(client.get_data({"props": prop, "elements": "O", "classes": "binary, lanthanoid, non-disordered"}))

Were you able to follow everything? Please, explain, what happens under the hood (tentatively), when we call client.get_data or client.get_dataframe.

PS don't forget to invalidate (revoke) your API key.