authors: Alireza Faghaninia, Alex Dunn, Joseph Montoya, Daniel Dopp
This notebook was last updated 11/15/18 for version 0.4.5 of matminer.
Note that in order to get the in-line plotting to work, you might need to start Jupyter notebook with a higher data rate, e.g., jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10
. We recommend you do this before starting.
A Citrine API key is required to load the data for this notebook (can be found under account settings). Set the CITRINE_API
environment varible or add API key as an argument to CitrineDataRetrieval()
. (Reference data retrieval notebook)
This notebook illustrates a few more advanced examples of matminer's visualization features. Note that these examples and a few additional ones are included in script form in the matminer_examples repository.
import pprint
import pandas as pd
from pymatgen.core.composition import Composition
from figrecipes import PlotlyFig
from matminer.datasets import load_dataset
from matminer.data_retrieval.retrieve_Citrine import CitrineDataRetrieval
This example generates a scatter plot of the properties of thermoelectric materials based on the data available in http://www.mrl.ucsb.edu:8080/datamine/thermoelectric.jsp The data is extracted via Citrine data retrieval tools. The dataset id on Citrine is 150557
# GET DATA
# Note that your Citrine API key must be set as the CITRINE_API
# environment variable or as an argument to the CitrineDataRetrieval() constructor
cdr = CitrineDataRetrieval()
df_te = cdr.get_dataframe(criteria={'data_type': 'experimental', 'data_set_id': 150557},
properties=['Seebeck coefficient'], secondary_fields=True)
# CLEAN AND PRUNE DATA
# Convert numeric columns to numeric data types
numeric_cols = ['chemicalFormula', 'Electrical resistivity', 'Seebeck coefficient',
'Thermal conductivity', 'Thermoelectric figure of merit (zT)']
df_te = df_te[numeric_cols].apply(pd.to_numeric, errors='ignore')
# Filter data based on resistivities between 0.0005 and 0.1 and
# Seebeck coefficients less than 500 and simplify zT naming
df_te = df_te[(5e-4 < df_te['Electrical resistivity']) & (df_te['Electrical resistivity'] < 0.1)]
df_te = df_te[abs(df_te['Seebeck coefficient']) < 500]
df_te = df_te.rename(columns={'Thermoelectric figure of merit (zT)': 'zT'})
# GENERATE PLOTS
pf = PlotlyFig(df_te, x_scale='log', fontfamily='Times New Roman',
hovercolor='white', x_title='Electrical Resistivity (cm/S)',
y_title='Seebeck Coefficient (uV/K)',
colorbar_title='Thermal Conductivity (W/m.K)',
mode='notebook')
pf.xy((df_te['Electrical resistivity'], df_te['Seebeck coefficient']),
labels='chemicalFormula',
sizes='zT',
colors='Thermal conductivity',
color_range=[0, 5])
0%| | 0/1093 [00:00<?, ?it/s]/Users/ardunn/alex/lbl/projects/common_env/dev_codes/matminer/matminer/data_retrieval/retrieve_Citrine.py:121: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead system_normdf = json_normalize(system_value) 100%|██████████| 1093/1093 [01:03<00:00, 17.28it/s]
all available fields: ['Crystallinity', 'Electrical resistivity-units', 'Space group', 'Seebeck coefficient', 'Thermoelectric figure of merit (zT)-conditions', 'Electrical conductivity', 'Thermal conductivity-units', 'Electrical resistivity-conditions', 'Thermal conductivity-conditions', 'Electrical resistivity', 'uid', 'Power factor-dataType', 'Thermal conductivity', 'Preparation method', 'Seebeck coefficient-units', 'Electrical conductivity-conditions', 'references', 'Electrical resistivity-dataType', 'category', 'Power factor', 'Power factor-units', 'Thermoelectric figure of merit (zT)-dataType', 'Electrical conductivity-units', 'Seebeck coefficient-conditions', 'Electrical conductivity-dataType', 'Seebeck coefficient-dataType', 'Thermoelectric figure of merit (zT)', 'Power factor-conditions', 'Thermal conductivity-dataType', 'chemicalFormula'] suggested common fields: ['references', 'chemicalFormula', 'Crystallinity', 'Preparation method', 'Space group', 'Electrical resistivity', 'Electrical resistivity-units', 'Electrical resistivity-conditions', 'Electrical resistivity-dataType', 'Seebeck coefficient', 'Seebeck coefficient-units', 'Seebeck coefficient-conditions', 'Seebeck coefficient-dataType', 'Power factor', 'Power factor-units', 'Power factor-conditions', 'Power factor-dataType', 'Thermoelectric figure of merit (zT)', 'Thermoelectric figure of merit (zT)-conditions', 'Thermoelectric figure of merit (zT)-dataType', 'Thermal conductivity', 'Thermal conductivity-units', 'Thermal conductivity-conditions', 'Thermal conductivity-dataType', 'Electrical conductivity', 'Electrical conductivity-units', 'Electrical conductivity-conditions', 'Electrical conductivity-dataType']
PlotlyFig may use a number of plotting modes, which are illustrated in the following examples. First we set up the figure:
# Note, if this is your first time loading this dataset it will be downloaded from an external repository
df = load_dataset("elastic_tensor_2015")
pf = PlotlyFig(df, title='Elastic data', mode='offline',
x_scale='log', y_scale='log')
Decoding objects from /Users/ardunn/alex/lbl/projects/common_env/dev_codes/matminer/matminer/datasets/elastic_tensor_2015.json.gz: 100%|##########| 4724/4724 [00:18<00:00, 5327.36it/s]
# Lets plot offline (the default) first. An html file will be created.
pf.xy((df['poisson_ratio'], df['elastic_anisotropy']), labels='formula')
# Plot and save figure without showing offline plot and specifying filename
pf.set_arguments(show_offline_plot=False, filename="myplot.html")
pf.xy((df['poisson_ratio'], df['elastic_anisotropy']), labels='formula')
# Uncomment and set your Plotly API information to plot in static mode
# pf.set_arguments(mode='static', api_key=YOUR_API_KEY,
# username=YOUR_USERNAME,
# filename="my_PlotlyFig_plot.jpeg")
# pf.xy([('poisson_ratio', 'elastic_anisotropy')], labels='formula')
# pf.set_arguments(mode='online')
# pf.xy([('poisson_ratio', 'elastic_anisotropy')], labels='formula')
pf.set_arguments(mode='notebook')
pf.xy((df['poisson_ratio'], df['elastic_anisotropy']), labels='formula')
fig = pf.xy((df['poisson_ratio'], df['elastic_anisotropy']), labels='formula',
return_plot=True)
print("Here's our returned figure!")
pprint.pprint(fig)
Here's our returned figure! {'data': [Scatter({ 'hoverinfo': 'x+y+text', 'hoverlabel': {'font': {'family': 'Courier', 'size': 25.0}}, 'line': {'dash': 'solid', 'width': 2}, 'marker': {'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'], [0.2222222222222222, '#3e4989'], [0.3333333333333333, '#31688e'], [0.4444444444444444, '#26828e'], [0.5555555555555556, '#1f9e89'], [0.6666666666666666, '#35b779'], [0.7777777777777778, '#6ece58'], [0.8888888888888888, '#b5de2b'], [1.0, '#fde725']], 'line': {'color': 'black', 'width': 1}, 'showscale': False, 'size': 10.0, 'symbol': 'circle'}, 'mode': 'markers', 'name': 'elastic_anisotropy', 'text': [Nb4CoSi, Al(CoSi)2, SiOs, ..., YSi, Al2Cu, VCu3Se4], 'x': array([0.28570074, 0.26810535, 0.30778029, ..., 0.20684971, 0.32173802, 0.25852284]), 'y': array([0.03068792, 0.26690966, 0.75648918, ..., 0.45469128, 0.73544949, 0.20571802]) })], 'layout': {'hoverlabel': {'font': {'family': 'Courier', 'size': 25.0}}, 'hovermode': 'closest', 'legend': {'font': {'family': 'Courier', 'size': 25.0}}, 'margin': {'b': 150.0, 'l': 150.0, 'pad': 0, 'r': 100, 't': 100}, 'paper_bgcolor': 'white', 'plot_bgcolor': 'white', 'title': 'Elastic data', 'titlefont': {'family': 'Courier', 'size': 25.0}, 'xaxis': {'tickfont': {'family': 'Courier', 'size': 25.0}, 'title': '', 'titlefont': {'family': 'Courier', 'size': 25.0}, 'type': 'log'}, 'yaxis': {'tickfont': {'family': 'Courier', 'size': 25.0}, 'title': '', 'titlefont': {'family': 'Courier', 'size': 25.0}, 'type': 'log'}}}
# Edit the figure and plot it with the current plot mode (online):
fig['layout']['hoverlabel']['bgcolor'] = 'pink'
fig['layout']['title'] = 'My Custom Elastic Data Figure'
pf.set_arguments(mode='notebook')
pf.create_plot(fig)
Reading file /Users/ardunn/alex/lbl/projects/common_env/dev_codes/matminer/matminer/datasets/elastic_tensor_2015.json.gz: 4724it [04:57, 15.88it/s] Decoding objects from /Users/ardunn/alex/lbl/projects/common_env/dev_codes/matminer/matminer/datasets/elastic_tensor_2015.json.gz: 100%|##########| 4724/4724 [04:57<00:00, 15.88it/s]
PlotlyFig provides a set of arguments which make setting up good looking Plotly templates quicker and easier.
Most formatting options can be set through the initializer of PlotlyFig. These options will remain the same for all figures producted, but you can change some common formatting options after instantitating a PlotlyFig object using set_arguments.
Chart-specific formatting options can be passed to plotting methods.
pf = PlotlyFig(df=df,
# api_key=api_key,
# username=username,
mode='notebook',
title='Comparison of Bulk Modulus and Shear Modulus',
x_title='Shear modulus (GPa)',
y_title='Bulk modulus (GPa)',
colorbar_title='Poisson Ratio',
fontfamily='Raleway',
fontscale=0.75,
fontcolor='#283747',
ticksize=30,
colorscale="Reds",
hovercolor='white',
hoverinfo='text',
bgcolor='#F4F6F6',
margins=110,
pad=10)
pf.xy((df['G_VRH'], df['K_VRH']), labels='material_id', colors='poisson_ratio')
Latex labels are also supported, but only in online/static mode.
# We can also use LaTeX if we use Plotly online/static
# pf.set_arguments(title="$\\text{Origin of Poisson Ratio } \\nu $",
# y_title='$K_{VRH} \\text{(GPa)}$',
# x_title='$G_{VRH} \\text{(GPa)}$',
# colorbar_title='$\\nu$',
# api_key=YOUR_API_KEY, username=YOUR_USERNAME)
# pf.xy(('G_VRH', 'K_VRH'), labels='material_id', colors='poisson_ratio')