RAxML is one of the most popular tools for inferring phylogenetic trees using maximum likelihood. It is fast even for very large data sets. The documentation for raxml is huge, and there are many options. However, I tend to use the same small number of options very frequently, which motivated me to write the ipa.raxml()
tool to automate the process of generating RAxml command line strings, running them, and accessing the resulting tree files. The simplicity of this tool makes it easy to incorporate into other more complex tools, for example, to infer tress in sliding windows along the genome using the ipa.treeslider
tool.
# conda install ipyrad -c conda-forge -c bioconda
# conda install raxml -c conda-forge -c bioconda
# conda install ipcoal -c conda-forge
import ipyrad.analysis as ipa
import toytree
import ipcoal
# the true tree
tree = toytree.rtree.imbtree(ntips=10, treeheight=1e7)
tree.draw(ts='p');
# setup simulator
subst = {
"state_frequencies": [0.3, 0.2, 0.3, 0.2],
"kappa": 0.25,
"gamma": 0.20,
"gamma_categories": 4,
}
mod = ipcoal.Model(tree=tree, Ne=1e5, nsamples=2, mut=1e-8, substitution_model=subst)
mod.sim_loci(nloci=1, nsites=10000)
mod.write_concat_to_phylip(name="raxtest", outdir="/tmp", diploid=True)
wrote concat locus (10 x 10000bp) to /tmp/raxtest.phy
# init raxml object with input data and (optional) parameter options
rax = ipa.raxml(data="/tmp/raxtest.phy", T=4, N=100)
# print the raxml command string for prosperity
print(rax.command)
# run the command, (options: block until finishes; overwrite existing)
rax.run(block=True, force=True)
/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 4 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/raxtest.phy -p 54321 -N 100 -x 12345 job test finished successfully
After inferring a tree you can then visualize it in a notebook using toytree
.
# load from the .trees attribute of the raxml object, or from the saved tree file
tre = toytree.tree(rax.trees.bipartitions)
# draw the tree
rtre = tre.root("r9")
rtre.draw(tip_labels_align=True, node_sizes=18, node_labels="support");
By default several parameters are pre-set in the raxml object. To remove those parameters from the command string you can set them to None. Additionally, you can build complex raxml command line strings by adding almost any parameter to the raxml object init, like below. You probably can't do everythin in raxml using this tool, it's only meant as a convenience. You can always of course just write the raxml command line string by hand instead.
# init raxml object
rax = ipa.raxml(data="/tmp/raxtest.phy", T=4, N=10)
# parameter dictionary for a raxml object
rax.params
N 10 T 4 binary ~/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 f a m GTRGAMMA n test p 54321 s /tmp/raxtest.phy w ~/Documents/ipyrad/testdocs/analysis/analysis-raxml x 12345
# paths to output files produced by raxml inference
rax.trees
bestTree ~/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bestTree.test bipartitions ~/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bipartitions.test bipartitionsBranchLabels ~/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bipartitionsBranchLabels.test bootstrap ~/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bootstrap.test info ~/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_info.test
Most frequently used: perform 100 rapid bootstrap analyses followed by 10 rapid hill-climbing ML searches from random starting trees under the GTRGAMMA substitution model.
rax = ipa.raxml(
data="/tmp/raxtest.phy",
name="test-1",
workdir="analysis-raxml",
m="GTRGAMMA",
T=8,
f="a",
N=50,
)
print(rax.command)
rax.run(force=True)
/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 8 -m GTRGAMMA -n test-1 -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/raxtest.phy -p 54321 -N 50 -x 12345 job test-1 finished successfully
Another common option: Perform N rapid hill-climbing ML analyses from random starting trees, with no bootstrap replicates. Be sure to use the BestTree
output from this analysis since it does not produce a bipartitions
output file.
rax = ipa.raxml(
data="/tmp/raxtest.phy",
name="test-2",
workdir="analysis-raxml",
m="GTRGAMMA",
T=8,
f="d",
N=10,
x=None,
)
print(rax.command)
rax.run(force=True)
/home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f d -T 8 -m GTRGAMMA -n test-2 -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/raxtest.phy -p 54321 -N 10 job test-2 finished successfully
The .info and related log files will be stored in the workdir
. Be sure to look at these for further details of your analyses.
! cat ./analysis-raxml/RAxML_info.test-1
Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018. With greatly appreciated code contributions by: Andre Aberer (HITS) Simon Berger (HITS) Alexey Kozlov (HITS) Kassian Kobert (HITS) David Dao (KIT and HITS) Sarah Lutteropp (KIT and HITS) Nick Pattengale (Sandia) Wayne Pfeiffer (SDSC) Akifumi S. Tanabe (NRIFS) Charlie Taylor (UF) Alignment has 738 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 0.00% RAxML rapid bootstrapping and subsequent ML search Using 1 distinct models/data partitions with joint branch length optimization Executing 50 rapid bootstrap inferences and thereafter a thorough ML search All free model parameters will be estimated by RAxML GAMMA model of rate heterogeneity, ML estimate of alpha-parameter GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Alignment Patterns: 738 Name: No Name Provided DataType: DNA Substitution Matrix: GTR RAxML was called as follows: /home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f a -T 8 -m GTRGAMMA -n test-1 -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/raxtest.phy -p 54321 -N 50 -x 12345 Time for BS model parameter optimization 0.012243 Bootstrap[0]: Time 0.028424 seconds, bootstrap likelihood -31256.799109, best rearrangement setting 13 Bootstrap[1]: Time 0.012425 seconds, bootstrap likelihood -31561.478478, best rearrangement setting 13 Bootstrap[2]: Time 0.015826 seconds, bootstrap likelihood -31384.937422, best rearrangement setting 12 Bootstrap[3]: Time 0.015474 seconds, bootstrap likelihood -31232.418170, best rearrangement setting 15 Bootstrap[4]: Time 0.013286 seconds, bootstrap likelihood -31727.424206, best rearrangement setting 10 Bootstrap[5]: Time 0.012674 seconds, bootstrap likelihood -31794.856117, best rearrangement setting 10 Bootstrap[6]: Time 0.012284 seconds, bootstrap likelihood -31364.095102, best rearrangement setting 6 Bootstrap[7]: Time 0.012507 seconds, bootstrap likelihood -30987.465040, best rearrangement setting 14 Bootstrap[8]: Time 0.025328 seconds, bootstrap likelihood -31661.559274, best rearrangement setting 5 Bootstrap[9]: Time 0.020408 seconds, bootstrap likelihood -31624.263244, best rearrangement setting 7 Bootstrap[10]: Time 0.019384 seconds, bootstrap likelihood -31113.558007, best rearrangement setting 7 Bootstrap[11]: Time 0.025762 seconds, bootstrap likelihood -31346.812031, best rearrangement setting 5 Bootstrap[12]: Time 0.013538 seconds, bootstrap likelihood -31485.855305, best rearrangement setting 8 Bootstrap[13]: Time 0.030762 seconds, bootstrap likelihood -31847.504928, best rearrangement setting 7 Bootstrap[14]: Time 0.012672 seconds, bootstrap likelihood -31979.948429, best rearrangement setting 12 Bootstrap[15]: Time 0.015875 seconds, bootstrap likelihood -31359.113575, best rearrangement setting 15 Bootstrap[16]: Time 0.013703 seconds, bootstrap likelihood -31511.570437, best rearrangement setting 12 Bootstrap[17]: Time 0.012595 seconds, bootstrap likelihood -31356.443157, best rearrangement setting 12 Bootstrap[18]: Time 0.012587 seconds, bootstrap likelihood -31516.410335, best rearrangement setting 5 Bootstrap[19]: Time 0.012513 seconds, bootstrap likelihood -31715.502621, best rearrangement setting 8 Bootstrap[20]: Time 0.030856 seconds, bootstrap likelihood -31330.596079, best rearrangement setting 10 Bootstrap[21]: Time 0.013577 seconds, bootstrap likelihood -31526.335199, best rearrangement setting 6 Bootstrap[22]: Time 0.013242 seconds, bootstrap likelihood -31723.116305, best rearrangement setting 5 Bootstrap[23]: Time 0.013744 seconds, bootstrap likelihood -31067.948839, best rearrangement setting 7 Bootstrap[24]: Time 0.012607 seconds, bootstrap likelihood -31838.730422, best rearrangement setting 8 Bootstrap[25]: Time 0.012560 seconds, bootstrap likelihood -31331.553790, best rearrangement setting 6 Bootstrap[26]: Time 0.013312 seconds, bootstrap likelihood -31453.197341, best rearrangement setting 12 Bootstrap[27]: Time 0.027944 seconds, bootstrap likelihood -31274.307232, best rearrangement setting 10 Bootstrap[28]: Time 0.017660 seconds, bootstrap likelihood -31571.543196, best rearrangement setting 11 Bootstrap[29]: Time 0.012387 seconds, bootstrap likelihood -31579.417834, best rearrangement setting 12 Bootstrap[30]: Time 0.014712 seconds, bootstrap likelihood -31520.046802, best rearrangement setting 15 Bootstrap[31]: Time 0.014092 seconds, bootstrap likelihood -32161.649217, best rearrangement setting 15 Bootstrap[32]: Time 0.012034 seconds, bootstrap likelihood -30990.065111, best rearrangement setting 6 Bootstrap[33]: Time 0.030149 seconds, bootstrap likelihood -31475.093176, best rearrangement setting 15 Bootstrap[34]: Time 0.013112 seconds, bootstrap likelihood -31857.142659, best rearrangement setting 11 Bootstrap[35]: Time 0.012832 seconds, bootstrap likelihood -31694.368466, best rearrangement setting 8 Bootstrap[36]: Time 0.013167 seconds, bootstrap likelihood -31576.845410, best rearrangement setting 9 Bootstrap[37]: Time 0.012264 seconds, bootstrap likelihood -30872.766864, best rearrangement setting 11 Bootstrap[38]: Time 0.011929 seconds, bootstrap likelihood -30900.983196, best rearrangement setting 8 Bootstrap[39]: Time 0.012578 seconds, bootstrap likelihood -30945.725549, best rearrangement setting 13 Bootstrap[40]: Time 0.013417 seconds, bootstrap likelihood -31387.327714, best rearrangement setting 13 Bootstrap[41]: Time 0.028272 seconds, bootstrap likelihood -31207.229215, best rearrangement setting 10 Bootstrap[42]: Time 0.016108 seconds, bootstrap likelihood -31669.192821, best rearrangement setting 5 Bootstrap[43]: Time 0.011827 seconds, bootstrap likelihood -31791.920849, best rearrangement setting 14 Bootstrap[44]: Time 0.013031 seconds, bootstrap likelihood -31714.252273, best rearrangement setting 10 Bootstrap[45]: Time 0.012172 seconds, bootstrap likelihood -31266.164753, best rearrangement setting 15 Bootstrap[46]: Time 0.011897 seconds, bootstrap likelihood -31159.607935, best rearrangement setting 14 Bootstrap[47]: Time 0.011418 seconds, bootstrap likelihood -30865.823867, best rearrangement setting 8 Bootstrap[48]: Time 0.030119 seconds, bootstrap likelihood -31970.680485, best rearrangement setting 6 Bootstrap[49]: Time 0.026086 seconds, bootstrap likelihood -31512.708811, best rearrangement setting 7 Overall Time for 50 Rapid Bootstraps 0.841106 seconds Average Time per Rapid Bootstrap 0.016822 seconds Starting ML Search ... Fast ML optimization finished Fast ML search Time: 0.483205 seconds Slow ML Search 0 Likelihood: -31496.126250 Slow ML Search 1 Likelihood: -31496.126250 Slow ML Search 2 Likelihood: -31496.126250 Slow ML Search 3 Likelihood: -31496.126250 Slow ML Search 4 Likelihood: -31496.126250 Slow ML Search 5 Likelihood: -31496.126250 Slow ML Search 6 Likelihood: -31496.126250 Slow ML Search 7 Likelihood: -31496.126250 Slow ML Search 8 Likelihood: -31496.126250 Slow ML Search 9 Likelihood: -31496.126250 Slow ML optimization finished Slow ML search Time: 0.505230 seconds Thorough ML search Time: 0.056678 seconds Final ML Optimization Likelihood: -31496.126248 Model Information: Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA alpha: 0.331162 Tree-Length: 0.383559 rate A <-> C: 1.035981 rate A <-> G: 0.374649 rate A <-> T: 1.025646 rate C <-> G: 1.090484 rate C <-> T: 0.363051 rate G <-> T: 1.000000 freq pi(A): 0.303095 freq pi(C): 0.196877 freq pi(G): 0.304732 freq pi(T): 0.195296 ML search took 1.045715 secs or 0.000290 hours Combined Bootstrap and ML search took 1.886871 secs or 0.000524 hours Drawing Bootstrap Support Values on best-scoring ML tree ... Found 1 tree in File /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bestTree.test-1 Found 1 tree in File /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bestTree.test-1 Program execution info written to /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_info.test-1 All 50 bootstrapped trees written to: /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bootstrap.test-1 Best-scoring ML tree written to: /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bestTree.test-1 Best-scoring ML tree with support values written to: /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bipartitions.test-1 Best-scoring ML tree with support values as branch labels written to: /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bipartitionsBranchLabels.test-1 Overall execution time for full ML analysis: 1.887431 secs or 0.000524 hours or 0.000022 days
! cat ./analysis-raxml/RAxML_info.test-2
Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" This is RAxML version 8.2.12 released by Alexandros Stamatakis on May 2018. With greatly appreciated code contributions by: Andre Aberer (HITS) Simon Berger (HITS) Alexey Kozlov (HITS) Kassian Kobert (HITS) David Dao (KIT and HITS) Sarah Lutteropp (KIT and HITS) Nick Pattengale (Sandia) Wayne Pfeiffer (SDSC) Akifumi S. Tanabe (NRIFS) Charlie Taylor (UF) Alignment has 738 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 0.00% RAxML rapid hill-climbing mode Using 1 distinct models/data partitions with joint branch length optimization Executing 10 inferences on the original alignment using 10 distinct randomized MP trees All free model parameters will be estimated by RAxML GAMMA model of rate heterogeneity, ML estimate of alpha-parameter GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Alignment Patterns: 738 Name: No Name Provided DataType: DNA Substitution Matrix: GTR RAxML was called as follows: /home/deren/miniconda3/envs/scratch/bin/raxmlHPC-PTHREADS-AVX2 -f d -T 8 -m GTRGAMMA -n test-2 -w /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml -s /tmp/raxtest.phy -p 54321 -N 10 Partition: 0 with name: No Name Provided Base frequencies: 0.303 0.197 0.305 0.195 Inference[0]: Time 0.111880 GAMMA-based likelihood -31496.125560, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373779 1.026351 1.090405 0.363302 1.000000 Inference[1]: Time 0.162115 GAMMA-based likelihood -31496.125561, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373778 1.026351 1.090405 0.363302 1.000000 Inference[2]: Time 0.127516 GAMMA-based likelihood -31496.125560, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373779 1.026352 1.090405 0.363302 1.000000 Inference[3]: Time 0.130142 GAMMA-based likelihood -31496.125560, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373779 1.026352 1.090405 0.363302 1.000000 Inference[4]: Time 0.166387 GAMMA-based likelihood -31496.125561, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036483 0.373778 1.026352 1.090405 0.363302 1.000000 Inference[5]: Time 0.182148 GAMMA-based likelihood -31496.125560, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036483 0.373779 1.026351 1.090405 0.363302 1.000000 Inference[6]: Time 0.131267 GAMMA-based likelihood -31496.125559, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373780 1.026352 1.090405 0.363302 1.000000 Inference[7]: Time 0.134231 GAMMA-based likelihood -31496.125559, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373780 1.026352 1.090405 0.363302 1.000000 Inference[8]: Time 0.130028 GAMMA-based likelihood -31496.125558, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036484 0.373781 1.026352 1.090406 0.363302 1.000000 Inference[9]: Time 0.129243 GAMMA-based likelihood -31496.125561, best rearrangement setting 5 alpha[0]: 0.331155 rates[0] ac ag at cg ct gt: 1.036483 0.373778 1.026351 1.090405 0.363302 1.000000 Conducting final model optimizations on all 10 trees under GAMMA-based models .... Inference[0] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.0 Inference[1] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.1 Inference[2] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.2 Inference[3] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.3 Inference[4] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.4 Inference[5] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.5 Inference[6] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.6 Inference[7] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.7 Inference[8] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.8 Inference[9] final GAMMA-based Likelihood: -31496.125174 tree written to file /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_result.test-2.RUN.9 Starting final GAMMA-based thorough Optimization on tree 8 likelihood -31496.125174 .... Final GAMMA-based Score of best tree -31496.125174 Program execution info written to /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_info.test-2 Best-scoring ML tree written to: /home/deren/Documents/ipyrad/testdocs/analysis/analysis-raxml/RAxML_bestTree.test-2 Overall execution time: 1.473922 secs or 0.000409 hours or 0.000017 days