reply to https://www.biostars.org/p/9500386/
Biopython already present in launches from this repo.
# Get example sequences
!mv ../data/S288C_YMR054W_STV1_protein.fsa .
!curl -o S288C_YOR270C_VPH1_protein.fsa https://gist.githubusercontent.com/fomightez/f46b0624f1d8e3abb6ff908fc447e63b/raw/7ef7cfdaa2c9f9974f22fd60be3cfe7d1935cd86/ux_S288C_YOR270C_VPH1_protein.fsa
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 886 100 886 0 0 4789 0 --:--:-- --:--:-- --:--:-- 4789
# contcatenate into FASTA files into one multi-entry FASTA
!cat S288C_YMR054W_STV1_protein.fsa <(echo) S288C_YOR270C_VPH1_protein.fsa > seqs.fasta
# adding space between them is based on https://stackoverflow.com/a/23549826/8508004 ; contrast with
#!cat S288C_YMR054W_STV1_protein.fsa S288C_YOR270C_VPH1_protein.fsa > seqs.fasta
# install clustalo
%conda install -c bioconda clustalo
Collecting package metadata (current_repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 4.9.2 latest version: 4.11.0 Please update conda by running $ conda update -n base conda ## Package Plan ## environment location: /srv/conda/envs/notebook added / updated specs: - clustalo The following packages will be downloaded: package | build ---------------------------|----------------- argtable2-2.13 | h14c3975_1001 2.7 MB conda-forge ca-certificates-2021.10.8 | ha878542_0 139 KB conda-forge certifi-2021.10.8 | py37h89c1867_1 145 KB conda-forge clustalo-1.2.4 | h1b792b2_4 313 KB bioconda ------------------------------------------------------------ Total: 3.3 MB The following NEW packages will be INSTALLED: argtable2 conda-forge/linux-64::argtable2-2.13-h14c3975_1001 clustalo bioconda/linux-64::clustalo-1.2.4-h1b792b2_4 The following packages will be UPDATED: ca-certificates 2021.5.30-ha878542_0 --> 2021.10.8-ha878542_0 certifi 2021.5.30-py37h89c1867_0 --> 2021.10.8-py37h89c1867_1 Downloading and Extracting Packages certifi-2021.10.8 | 145 KB | ##################################### | 100% argtable2-2.13 | 2.7 MB | ##################################### | 100% ca-certificates-2021 | 139 KB | ##################################### | 100% clustalo-1.2.4 | 313 KB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done Note: you may need to restart the kernel to use updated packages.
from Bio import SeqIO
from Bio.Align.Applications import ClustalOmegaCommandline
clustalomega_cline = ClustalOmegaCommandline(infile="seqs.fasta", outfile="new_alignment.aln", verbose=True, outfmt= "clu",
auto=True)
print(clustalomega_cline)
clustalo -i seqs.fasta -o new_alignment.aln --outfmt clu --auto -v
!clustalo -i seqs.fasta -o new_alignment.aln --outfmt clu --auto -v
Using 8 threads Read 2 sequences (type: Protein) from seqs.fasta not more sequences (2) than cluster-size (100), turn off mBed Setting options automatically based on input sequence characteristics (might overwrite some of your options). Auto settings: Enabling mBed. Auto settings: Setting iteration to 1. Progressive alignment progress done. CPU time: 0.11u 0.02s 00:00:00.13 Elapsed: 00:00:00 Iteration step 1 out of 1 Computing new guide tree (iteration step 1) Computing HMM from alignment Progressive alignment progress done. CPU time: 0.40u 0.03s 00:00:00.43 Elapsed: 00:00:00 Alignment written to new_alignment.aln
!cat new_alignment.aln
CLUSTAL O(1.2.4) multiple sequence alignment STV1 -MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLR VPH1 MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR ::********:*: **:*** *: *: :: **::.:. . ***..: **** :**::* STV1 RFDEVERMVGFLNEVVEKHAAETWKYILHIDDEGNDIAQPDMADLINTMEPLSLENVNDM VPH1 RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGS----------GELYVPPSGSVIDDY *:*:*** :: .:::** : :: . :*. : * * . ::* STV1 VKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNPEIEQE VPH1 VRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDE-------------------- *:: : *.* *:::: *.:. : *** : * :: . .: STV1 ERDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGYQHRYM VPH1 ------FFLKGDN-----TDSTSYMDEDMIDAN--GENIAA-----------AIGASVNY * :. *: :*: *: ** * . *:::: STV1 ITGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPLLEGK--EKVEKDCFIIFTHGETLL VPH1 VTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLII :** * * ** *::****:*****:*:.. **:*: : * * .*:.**:*:**: :: STV1 KKVKRVIDSLNGKIVSLNT---RSSELVDTLNRQIDDLQRILDTTEQTLHTELLVIHDQL VPH1 KRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYAIAKEL *::::: :**:.:: .::: *: : .:*:::.** :*.**. **.:** .* .:* STV1 PVWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLGSEYST VPH1 DSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLGIDVPS * ..*** :: *** :: : : ******:* ** ** * : * ** : : STV1 VFNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAIMFGDM VPH1 IIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAIMFGDM :::*: **: ***:******* .**** *.**** *:****** *:************** STV1 GHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKS VPH1 GHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYNDIFSKT ****:: * ** *****:*:. *:*.***********::****.**:***:********: STV1 MTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSILMGYA VPH1 MTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSILMGFI ********:**. ::***** *...*.**:***:*****:*.****************: STV1 HMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAP VPH1 HMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKDGKPAP *****::** *: ** :**********:***.****** .*****: **:**.**** STV1 GLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKNGGGG VPH1 GLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTHKKKSHE ***********:******:** ***:**.*** ****:***** *** :: :*: . STV1 RPHGYQSVGNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFNFGDVM VPH1 PLPSTEA-------------DASSEDLEAQQLISAMDADDAEEEEVGSGSHG-EDFGDIM . :: *:*.::. : .. .. *: :*.**.*.:* :***:* STV1 IHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMK VPH1 IHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGF---VGVFM **************:************************ ***. **. :. :.*: STV1 VVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFRAIIE*- VPH1 TVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEV .* ******.** .:**:*********:*******:***** *** ****:*. STV1 ------------ VPH1 AVASASSSASS*
# show Biopython version present
!python -c "import Bio; print(Bio.__version__)"
1.79