!date
Fri Feb 14 08:10:37 PST 2014
Updates - blast full path
subsequent remove of 'blast' variable use as now full path
--
have to manually change sqlshare id in code (for now)
The concept is that you can take a fasta file in a working directory and end up with GO slim information all within a single notebook that is automated. Currently this work by writing (and overwriting) as scracth file to SQLShare. Assumptions are that you are working in a directory with fasta file named query.fa
. And you have SQLShare Python client install
#allows plots to be shown inline
%pylab inline
Populating the interactive namespace from numpy and matplotlib
#Setting Working Directory
wd="/Users/Mackenzie/Desktop/FISH546/wd"
#Setting directory of Blast Databases !!! make sure you have last '/'
dbd="/Users/Mackenzie/Desktop/FISH546/db/"
#Database name
dbn="spdb"
#Blast algorithim complete path
ba="/Users/Shared/Apps/ncbi-blast-2.2.29\+/bin/blastx"
#Location of SQLShare python tools: you can empty ("") if tools are in PATH !!! make sure you have last '/'
spd="/Users/Mackenzie/sqlshare-pythonclient/tools/"
cd {wd}
[Errno 13] Permission denied: '/Users/Mackenzie/Desktop/FISH546/wd' /Users/Steven/Dropbox/Steven/ipython_nb/tools
#for some reason max hsp produced error and removed
!{ba} -query query.fa -db {dbd}{dbn} -out {dbn}_blast_out.tab -evalue 1E-50 -num_threads 4 -max_target_seqs 1 -outfmt 6
Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 49 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 52 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 47 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 40 replaced by X Selenocysteine (U) at position 690 replaced by X Selenocysteine (U) at position 690 replaced by X Selenocysteine (U) at position 667 replaced by X Selenocysteine (U) at position 667 replaced by X Selenocysteine (U) at position 665 replaced by X Selenocysteine (U) at position 665 replaced by X ^C
!head -1 {dbn}_blast_out.tab
ConsensusfromContig5 sp|Q9JHQ5|LZTL1_MOUSE 74.40 125 31 1 7 378 24 148 1e-59 192
#Translate pipes to tab so SPID is in separate column for Joining
!tr '|' "\t" <{dbn}_blast_out.tab> {dbn}_blast_out2.tab
!head -1 {dbn}_blast_out2.tab
ConsensusfromContig5 sp Q9JHQ5 LZTL1_MOUSE 74.40 125 31 1 7 378 24 148 1e-59 192
#Uploads formatted blast table to SQLshare; currently has generic name and meant to be temporary: Warning will overwrite.
!python {spd}singleupload.py -d scratchblast_out {dbn}_blast_out2.tab
processing chunk line 0 to 153 (0.000229120254517 s elapsed) pushing spdb_blast_out2.tab... parsing 40DB86D8... finished scratchblast_out
!python {spd}fetchdata.py -s "SELECT * FROM [mgavery@washington.edu].[scratchblast_out]blast Left Join [sr320@washington.edu].[uniprot-reviewed_wGO_010714]unp ON blast.Column3 = unp.Entry Left Join [sr320@washington.edu].[SPID and GO Numbers]go ON unp.Entry = go.SPID Left Join [sr320@washington.edu].[GO_to_GOslim]slim ON slim.GO_id = go.GOID" -f tsv -o {dbn}_join2goslim.txt
!head -2 {dbn}_join2goslim.txt
!python {spd}singleupload.py -d scratchjoin_slim {dbn}_join2goslim.txt
processing chunk line 0 to 1978 (0.00637292861938 s elapsed) pushing spdb_join2goslim.txt... parsing 94DDEBBA... finished scratchjoin_slim
#Sets GO aspect
!python {spd}fetchdata.py -s "SELECT Distinct Column1 as query, Column3 as SPID, GOSlim_bin FROM [mgavery@washington.edu].[scratchjoin_slim] Where aspect = 'P'" -f tsv -o justslim.txt
!head justslim.txt
from pandas import *
jslim = read_table("justslim.txt", # name of the data file
#sep=",", # what character separates each column?
na_values=["", " "]) # what values should be considered "blank" values?
--------------------------------------------------------------------------- IOError Traceback (most recent call last) <ipython-input-3-f6b9dbe27bfa> in <module>() 3 jslim = read_table("justslim.txt", # name of the data file 4 #sep=",", # what character separates each column? ----> 5 na_values=["", " "]) # what values should be considered "blank" values? //anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze) 399 buffer_lines=buffer_lines) 400 --> 401 return _read(filepath_or_buffer, kwds) 402 403 parser_f.__name__ = name //anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds) 207 208 # Create the parser. --> 209 parser = TextFileReader(filepath_or_buffer, **kwds) 210 211 if nrows is not None: //anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds) 507 self.options['has_index_names'] = kwds['has_index_names'] 508 --> 509 self._make_engine(self.engine) 510 511 def _get_options_with_defaults(self, engine): //anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine) 609 def _make_engine(self, engine='c'): 610 if engine == 'c': --> 611 self._engine = CParserWrapper(self.f, **self.options) 612 else: 613 if engine == 'python': //anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds) 891 # #2442 892 kwds['allow_leading_cols'] = self.index_col is not False --> 893 self._reader = _parser.TextReader(src, **kwds) 894 895 # XXX //anaconda/lib/python2.7/site-packages/pandas/_parser.so in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:2771)() //anaconda/lib/python2.7/site-packages/pandas/_parser.so in pandas._parser.TextReader._setup_parser_source (pandas/src/parser.c:4803)() IOError: File justslim.txt does not exist
jslim.groupby('GOSlim_bin').query.count().plot(kind='bar')
!say "hash tag winning"
#could also upload again to get a simple table
#could be done in pandas
#!python {spd}singleupload.py -d scratchpie justslim.txt
processing chunk line 0 to 2538 (0.00250601768494 s elapsed) pushing justslim.txt... parsing 87B0B7A8... finished scratchpie
#fetching data grouped by GObin
#!python {spd}fetchdata.py -s "SELECT GOSlim_bin, COUNT(GOSlim_bin) as termcount from [sr320@washington.edu].[scratchpie] Group by GOSlim_bin" -f tsv -o justpie.txt