Cookbook: RAxML analyses in a notebook¶

As part of the ipyrad.analysis toolkit we've created convenience functions for easily running common RAxML commands. This can be useful when you want to run all of your analyes in a clean stream-lined way in a jupyter-notebook to create a completely reproducible study.

Install software¶

There are many ways to install raxml, the simplest of which is to use conda. This will install several raxml binaries into your conda path. If you want to call a different version of raxml that can easily be done by changing the parameter 'binary'.

In [1]:

## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab
## conda install raxml -c bioconda

Create a raxml Class object¶

Create a raxml object which has a bunch of default parameters associated with it. The only required argument to initialize the object is a phylip formatted sequence file. In this example I provide a name and working directory as well.

In [2]:

import ipyrad.analysis as ipa
import toyplot
import toytree

In [3]:

rax = ipa.raxml(
    data="./analysis-ipyrad/aligntest_outfiles/aligntest.phy",
    name="aligntest", 
    workdir="analysis-raxml",
    );

Additional options¶

You can also modify many of the other command line arguments to raxml by changing values in the params dictionary of your raxml object. These values could also have been set when you initialized the object.

In [4]:

## set some other params
rax.params.N = 10
rax.params.T = 2
rax.params.o = None 
#rax.params.o = ["32082_przewalskii", "33588_przewalskii"]

Print the command string¶

It is good practice to always print the command string so that you know exactly what was called for you analysis and it is documented.

In [5]:

print rax.command

raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -N 10 -x 12345 -p 54321 -n aligntest -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy

Run the job¶

This will start the job running. We haven't made a progress bar yet but we will add one soon.

In [6]:

rax.run(force=True)

job aligntest finished successfully

Access results¶

One of the reasons it is so convenient to run your raxml jobs this way is that the results files are easily accessible from your raxml objects.

In [7]:

rax.trees

Out[7]:

bestTree                   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.aligntest
bipartitions               ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.aligntest
bipartitionsBranchLabels   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.aligntest
bootstrap                  ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.aligntest
info                       ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.aligntest

Plot the results¶

Here we use toytree to plot the bootstrap results.

In [8]:

tre = toytree.tree(rax.trees.bipartitions)
tre.root(wildcard="3")
tre.draw(
    height=300,
    width=300,
    node_labels=tre.get_node_values("support"),
);

[optional] Submit raxml jobs to run on a cluster¶

Using the ipyparallel library you can submit raxml jobs to run in parallel on cluster in a load-balanced fashion. You can then tell the notebook to wait until all jobs are finished before progressing in the notebook to draw trees, etc.

Start an ipyparallel cluster¶

In a separate terminal start an ipcluster instance and tell it how many engines to start.

In [9]:

##
##  ipcluster start --n=20
##

Create a Client connected to the cluster

In [10]:

import ipyparallel as ipp
ipyclient = ipp.Client()

Create several raxml objects for different data sets

In [11]:

rax1 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/pedic_outfiles/pedic.phy", 
    name="rax1", T=4, N=100)

rax2 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy", 
    name="rax2", T=4, N=100)

Submit jobs to run on the cluster queue.

In [12]:

rax1.run(ipyclient=ipyclient, force=True)
rax2.run(ipyclient=ipyclient, force=True)

job rax1 submitted to cluster
job rax2 submitted to cluster

Wait for jobs to finish

In [14]:

## you can query each job while it's running
rax1.async.ready()

Out[14]:

True

In [13]:

## or just block until all jobs on ipyclient are finished
ipyclient.wait()

Out[13]:

True

Plot trees when jobs are finished¶

Here we will draw a slighly more complex tree figure that combines two trees onto a single canvas.

In [15]:

## load trees and add to axes
tre1 = toytree.tree(rax1.trees.bipartitions)
tre1.root(wildcard="prz")
tre1.draw(width=300);

tre2 = toytree.tree(rax2.trees.bipartitions)
tre2.root(wildcard="3")
tre2.draw(width=300);