#!/usr/bin/env python
# coding: utf-8

# # Data mining using pyiron tables

# In this example, the data mining capabilities of pyiron using the `PyironTables` class is demonstrated by computing and contrasting the ground state properties of fcc-Al using various force fields.

# In[1]:


from pyiron import Project
import numpy as np


# In[2]:


pr = Project("potential_scan")


# ## Creating a dummy job to get list of potentials
# 
# In order to get the list of available LAMMPS potentials, a dummy job with an Al bulk structure is created

# In[3]:


dummy_job = pr.create_job(pr.job_type.Lammps, "dummy_job")
dummy_job.structure = pr.create_ase_bulk("Al")
# Chosing only select potentials to run (you can play with these valuess)
num_potentials = 5
potential_list = dummy_job.list_potentials()[:num_potentials]


# ## Creating a Murnaghan job for each potential in their respective subprojects
# 
# A separate Murnaghan job (to compute equilibrium lattice constant and the bulk modulus) is created and run for every potential

# In[4]:


for pot in potential_list:
    pot_str = pot.replace("-", "_")
    # open a subproject within a project
    with pr.open(pot_str) as pr_sub:
        # no need for unique job name if in different subprojects 
        job_name = "murn_Al"
        # Use the subproject to create the jobs
        murn = pr_sub.create_job(pr.job_type.Murnaghan, job_name)
        job_ref = pr_sub.create_job(pr.job_type.Lammps, "Al_ref")
        job_ref.structure = pr.create_ase_bulk("Al", cubic=True)
        job_ref.potential = pot
        job_ref.calc_minimize()
        murn.ref_job = job_ref
        # Some potentials may not work with certain LAMMPS compilations.
        # Therefore, we need to have a little exception handling
        try:
            murn.run()
        except RuntimeError:
            pass


# If you inspect the job table, you would find that each Murnaghan job generates various small LAMMPS jobs (see column `hamilton`). Some of these jobs might have failed with status `aborted`.

# In[5]:


pr.job_table()


# ## Analysis using `PyironTables`
# 
# The idea now is to go over all finished Murnaghan jobs and extract the equilibrium lattice parameter and bulk modulus, and classify them based of the potential used.

# ### Defining filter functions
# 
# Since a project can have thousands if not millions of jobs, it is necessary to "filter" the data and only apply the functions (some of which can be computationally expensive) to only this data. In this example, we need to filter jobs that are finished and are of type `Murnaghan`. This can be done in two ways: using the job table i.e. the entries in the database, or using the job itself i.e. using entries in the stored HDF5 file. Below are examples of filter functions acting on the job and the job table respectively.

# In[6]:


# Filtering using the database entries (which are obtained as a pandas Dataframe)
def db_filter_function(job_table):
    # Returns a pandas Series of boolean values (True for entries that have status finished 
    # and hamilton type Murnaghan.)
    return (job_table.status == "finished") & (job_table.hamilton == "Murnaghan")

# Filtering based on the job
def job_filter_function(job):
    # returns a boolean value if the status of the job 
    #is finished and if "murn" is in it's job name 
    return (job.status == "finished") & ("murn" in job.job_name)


# Obviously, using the database is faster in this case but sometimes it might be necessary to filter based on some data that are stored in the HDF5 file of the job. The database filter is applied first followed by the job based filter.

# ### Defining functions that act on jobs
# 
# Now we define a set of functions that will be applied on each job to return a certain value. The filtered jobs will be loaded and these functions will be applied on the loaded jobs. The advantage of such functions is that the jobs do not have to be loaded every time such operations are performed. The filtered jobs are loaded once, and then they are passed to these functions to construct the table.

# In[7]:


# Getting equilibrium lattice parameter from Murnaghan jobs
def get_lattice_parameter(job):
    return job["output/equilibrium_volume"] ** (1/3)

# Getting equilibrium bulk modulus from Murnaghan jobs
def get_bm(job):
    return job["output/equilibrium_bulk_modulus"]

# Getting the potential used in each Murnaghan job
def get_pot(job):
    child = job.project.inspect(job["output/id"][0])
    return child["input/potential/Name"]


# ### Creating a pyiron table
# 
# Now that all the functions are defined, the pyiron table called "table" is created in the following way. This works like a job and can be reloaded at any time.

# In[8]:


get_ipython().run_cell_magic('time', '', '# creating a pyiron table\ntable = pr.create_table("table")\n\n# assigning a database filter function\ntable.db_filter_function = db_filter_function\n\n# Alternatively/additionally, a job based filter function can be applied \n# (it does the same thing in this case).\n\n#table.filter_function = job_filter_function\n\n# Adding the functions using the labels you like\ntable.add["a_eq"] = get_lattice_parameter\ntable.add["bulk_modulus"] = get_bm\ntable.add["potential"] = get_pot\n# Running the table to generate the data\ntable.run(run_again=True)\n')


# The output can now be obtained as a pandas DataFrame

# In[9]:


table.get_dataframe()


# You can now compare the computed equilibrium lattice constants for each potential to those computed in the NIST database for Al (fcc phase). https://www.ctcms.nist.gov/potentials/system/Al/#Al.

# In[ ]: