#!/usr/bin/env python # coding: utf-8 # # Some sourmash command line examples! # # [sourmash](https://sourmash.readthedocs.io/en/latest/) is research software from the Lab for Data Intensive Biology at UC Davis. It implements MinHash and modulo hash. # # Below are some examples of using sourmash. They are computed in a Jupyter Notebook so you can run them yourself if you like! # # Sourmash works on *signature files*, which are just saved collections of hashes. # # Let's try it out! # # ### Running this notebook. # # You can run this notebook interactively via mybinder; click on this button: # [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/master?filepath=doc%2Fsourmash-examples.ipynb) # # A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under "Tutorials and notebooks". # # You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/master/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/master/binder/environment.yml) for installation dependencies. # # ### What is this? # # This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them. # # Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments! # ## Compute scaled signatures # In[1]: get_ipython().system('rm -f *.sig') get_ipython().system('sourmash compute -k 21,31,51 --scaled=1000 genomes/*.fa --name-from-first -f') # This outputs three signature files, each containing three signatures (one calculated at k=21, one at k=31, and one at k=51). # In[2]: ls *.sig # We can now use these signature files for various comparisons. # ## Search multiple signatures with a query # # The below command queries all of the signature files in the directory with the `shew_os223` signature and finds the best Jaccard similarity: # In[3]: get_ipython().system('sourmash search -k 31 shew_os223.fa.sig *.sig') # The below command uses Jaccard containment instead of Jaccard similarity: # In[4]: get_ipython().system('sourmash search -k 31 shew_os223.fa.sig *.sig --containment') # ## Performing all-by-all queries # # We can also compare all three signatures: # In[5]: get_ipython().system('sourmash compare -k 31 *.sig') # ...and produce a similarity matrix that we can use for plotting: # In[6]: get_ipython().system('sourmash compare -k 31 *.sig -o genome_compare.mat') # In[7]: get_ipython().system('sourmash plot genome_compare.mat') from IPython.display import Image Image(filename='genome_compare.mat.matrix.png') # and for the R aficionados, you can output a CSV version of the matrix: # In[8]: get_ipython().system('sourmash compare -k 31 *.sig --csv genome_compare.csv') # In[9]: get_ipython().system('cat genome_compare.csv') # This is now a file that you can load into R and examine - see [our documentation](https://sourmash.readthedocs.io/en/latest/other-languages.html) on that. # ## working with metagenomes # # Let's make a fake metagenome: # In[10]: get_ipython().system('cat genomes/*.fa > fake-metagenome.fa') get_ipython().system('sourmash compute -k 31 --scaled=1000 fake-metagenome.fa') # We can use the `sourmash gather` command to see what's in it: # In[11]: get_ipython().system('sourmash gather fake-metagenome.fa.sig shew*.sig akker*.sig') # ## Other pointers # # [Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html) # # [Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html) # # [Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html) # # ## A full list of notebooks # # [An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.ipynb) # # [Some sourmash command line examples!](sourmash-examples.ipynb) # # [Working with private collections of signatures.](sourmash-collections.ipynb) # # [Using the LCA_Database API.](using-LCA-database-API.ipynb) # # # # In[ ]: