Notebook
The sed command worked. No more spaces between contig names. Quick code explanation: s - substitute, / - for a space, / - substitute no space, /g - apply globally; ie make substitution throughout the entire line, not just at the first instance of the match
Need to restart the Terminal in order for the changes to PATH to be loaded. UPDATE: I think this got written to the wrong directory!
Yes! It works! Can now just run BLAST without changing directories in Linux!
Looks like I need to change permissions again. Can't do this in IPython. Be right back...
Moved that file to /home/samb/BioinformaticsTools/ncbi-blast-2.2.29+
Hmmmm... Everything looks correct, as far as I can tell from looking at the BLAST configuration documentation and other internet resources. Not sure why this isn't working.
Well, I've added "BLASTDB=/home/samb/BioinformaticsTools/ncbi-blast-2.2.29+/dbs" to the /etc/profile.d file that I used earlier to set the append the PATH. We'll see if that works. Restarting the computer.
Sweet! That worked! No more specifying full directories to databases or BLAST executables!!!
OK, starting this again because the fasta file (2014_RickettsiaGBnt) count should've indicated 11,000+ entries but the awk count (see In[7] above) indicated only 8000+ entries. Have re-downloaded all Rickettsia nucleotide (nt) entries from NCBI as a fasta file. There should be 11414 entries.
Perfect. Don't know what wrong with the last one. Will now make BLAST database.
Well, that's weird. Time to check permissions on the input/output locations. Ugh.
OK, there are no read permissions for anyone else. However, since I'm (samb) running the commands, it seems weird that it wont' work. Going to change permissions to 744. BRB...Changed permissions to the "dbs" folder using: sudo chmod -R 744 /media/B0FE4B1FFE4ADD6A/BioinformaticsTools/ncbi-blast-2.2.29+/dbs Let's see if that worked.
It didn't work! Well, that's weird. Tried this instead: sudo chmod 744 -R /media/B0FE4B1FFE4ADD6A/BioinformaticsTools/ncbi-blast-2.2.29+/dbs
Seems like the problem is possibly related to having moved the "BioinformaticsTools" folder to my larger partition (which is actually a Windows partition). Might have to modify how it is mounted in Linux in order to enable changes to the read/write permissions.
OK, I'll deal with this later. Moved "BioinformaticsTools" folder back to original location (/home/samb) and updated the myenvvars.sh (in etc/profile.d). Thinking about it some more, the problem might be related simply to me moving the BLAST folder to a different directory, instead of re-installing it in the new, desired location. Will test this out later.
Um, weird that there're entries for Arabidopsis and Zea mays... Will re-download Rickettsia nucleotides from GenBank. Ugh! Never mind! I'm an idiot! Didn't filter the initial NCBI search by Taxonomy! Doh! Only bacteria should have 10788 sequences.