Biopython already present in launches from this repo.

In [1]:
# Get example sequences
!mv ../data/S288C_YMR054W_STV1_protein.fsa .
!curl -o S288C_YOR270C_VPH1_protein.fsa https://gist.githubusercontent.com/fomightez/f46b0624f1d8e3abb6ff908fc447e63b/raw/7ef7cfdaa2c9f9974f22fd60be3cfe7d1935cd86/ux_S288C_YOR270C_VPH1_protein.fsa
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   886  100   886    0     0   4789      0 --:--:-- --:--:-- --:--:--  4789
In [2]:
# contcatenate into FASTA files into one multi-entry FASTA
!cat S288C_YMR054W_STV1_protein.fsa <(echo) S288C_YOR270C_VPH1_protein.fsa > seqs.fasta
# adding space between them is based on https://stackoverflow.com/a/23549826/8508004 ; contrast with
#!cat S288C_YMR054W_STV1_protein.fsa S288C_YOR270C_VPH1_protein.fsa > seqs.fasta
In [3]:
# install clustalo
%conda install -c bioconda clustalo
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.11.0

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /srv/conda/envs/notebook

  added / updated specs:
    - clustalo


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    argtable2-2.13             |    h14c3975_1001         2.7 MB  conda-forge
    ca-certificates-2021.10.8  |       ha878542_0         139 KB  conda-forge
    certifi-2021.10.8          |   py37h89c1867_1         145 KB  conda-forge
    clustalo-1.2.4             |       h1b792b2_4         313 KB  bioconda
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be INSTALLED:

  argtable2          conda-forge/linux-64::argtable2-2.13-h14c3975_1001
  clustalo           bioconda/linux-64::clustalo-1.2.4-h1b792b2_4

The following packages will be UPDATED:

  ca-certificates                      2021.5.30-ha878542_0 --> 2021.10.8-ha878542_0
  certifi                          2021.5.30-py37h89c1867_0 --> 2021.10.8-py37h89c1867_1



Downloading and Extracting Packages
certifi-2021.10.8    | 145 KB    | ##################################### | 100% 
argtable2-2.13       | 2.7 MB    | ##################################### | 100% 
ca-certificates-2021 | 139 KB    | ##################################### | 100% 
clustalo-1.2.4       | 313 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Note: you may need to restart the kernel to use updated packages.
In [4]:
from Bio import SeqIO
from Bio.Align.Applications import ClustalOmegaCommandline
clustalomega_cline = ClustalOmegaCommandline(infile="seqs.fasta", outfile="new_alignment.aln", verbose=True, outfmt= "clu",
auto=True)
In [5]:
print(clustalomega_cline)
clustalo -i seqs.fasta -o new_alignment.aln --outfmt clu --auto -v
In [6]:
!clustalo -i seqs.fasta -o new_alignment.aln --outfmt clu --auto -v
Using 8 threads
Read 2 sequences (type: Protein) from seqs.fasta
not more sequences (2) than cluster-size (100), turn off mBed
Setting options automatically based on input sequence characteristics (might overwrite some of your options).
Auto settings: Enabling mBed.
Auto settings: Setting iteration to 1.
Progressive alignment progress done. CPU time: 0.11u 0.02s 00:00:00.13 Elapsed: 00:00:00
Iteration step 1 out of 1
Computing new guide tree (iteration step 1)
Computing HMM from alignment
Progressive alignment progress done. CPU time: 0.40u 0.03s 00:00:00.43 Elapsed: 00:00:00
Alignment written to new_alignment.aln
In [7]:
!cat new_alignment.aln
CLUSTAL O(1.2.4) multiple sequence alignment


STV1      -MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLR
VPH1      MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR
            ::********:*: **:*** *: *: :: **::.:. . ***..: **** :**::*

STV1      RFDEVERMVGFLNEVVEKHAAETWKYILHIDDEGNDIAQPDMADLINTMEPLSLENVNDM
VPH1      RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGS----------GELYVPPSGSVIDDY
          *:*:***   :: .:::**  : ::   .   :*.           :   * * . ::* 

STV1      VKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNPEIEQE
VPH1      VRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDE--------------------
          *:: :  *.*  *:::: *.:. : *** : * :: . .:                    

STV1      ERDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGYQHRYM
VPH1      ------FFLKGDN-----TDSTSYMDEDMIDAN--GENIAA-----------AIGASVNY
                * :. *:     :*: *: **   * .  *::::                    

STV1      ITGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPLLEGK--EKVEKDCFIIFTHGETLL
VPH1      VTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLII
          :** * * **  *::****:*****:*:.. **:*: : *  *  .*:.**:*:**: ::

STV1      KKVKRVIDSLNGKIVSLNT---RSSELVDTLNRQIDDLQRILDTTEQTLHTELLVIHDQL
VPH1      KRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYAIAKEL
          *::::: :**:.:: .:::     *: : .:*:::.**  :*.**. **.:** .* .:*

STV1      PVWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLGSEYST
VPH1      DSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLGIDVPS
            *   ..*** ::  ***  :: : : ******:*  **  **  * : *  ** :  :

STV1      VFNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAIMFGDM
VPH1      IIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAIMFGDM
          :::*: **: ***:******* .**** *.**** *:****** *:**************

STV1      GHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKS
VPH1      GHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYNDIFSKT
          ****:: * ** *****:*:. *:*.***********::****.**:***:********:

STV1      MTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSILMGYA
VPH1      MTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSILMGFI
          ********:**. ::***** *...*.**:***:*****:*.****************: 

STV1      HMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAP
VPH1      HMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKDGKPAP
          *****::**  *:   ** :**********:***.****** .*****: **:**.****

STV1      GLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKNGGGG
VPH1      GLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTHKKKSHE
          ***********:******:**  ***:**.*** ****:***** *** ::  :*: .  

STV1      RPHGYQSVGNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFNFGDVM
VPH1      PLPSTEA-------------DASSEDLEAQQLISAMDADDAEEEEVGSGSHG-EDFGDIM
             . ::               *:*.::.  : .. .. *: :*.**.*.:*  :***:*

STV1      IHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMK
VPH1      IHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGF---VGVFM
          **************:************************ ***. **. :.    :.*: 

STV1      VVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFRAIIE*-
VPH1      TVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEV
          .* ******.** .:**:*********:*******:***** ***  ****:*.      

STV1      ------------
VPH1      AVASASSSASS*
                      
In [8]:
# show Biopython version present
!python -c "import Bio; print(Bio.__version__)"
1.79