#!/usr/bin/env python
# coding: utf-8

# ### Running in Docker container on Ostrich
# 
# #### Started Docker container with the following command:
# 
# ```docker run - p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:owl_web -v /Users/sam/gitrepos/LabDocs/jupyter_nbs/sam/:/jupyter_nbs -it f99537d7e06a```
# 
# The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.
# 
# Once the container was started, started Jupyter Notebook with the following command inside the Docker container:
# 
# ```jupyter notebook```
# 
# This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.
# 
# The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)

# In[1]:


get_ipython().run_cell_magic('bash', '', 'date\n')


# ### Check computer specs

# In[2]:


get_ipython().run_cell_magic('bash', '', 'hostname\n')


# In[3]:


get_ipython().run_cell_magic('bash', '', 'lscpu\n')


# ### Bloated notebook analysis

# In[5]:


get_ipython().run_cell_magic('bash', '', 'ls -lh /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### That notebook is over >100MB in size, which is too large for hosting on GitHub. Additionally, the notebook crashes the browser (and sometimes the computer) due to the ridiculous number of output lines generated by the ```wget``` command. Let's look at some more details.

# #### Line count

# In[6]:


get_ipython().run_cell_magic('bash', '', 'wc -l /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### In order to preserve some of the information in the orginal notebook before I strip the output, we'll look at the file in a bit more depth...

# #### How long did the wget command for the Ostrea lurida files take?
# 
# #### First, let's find the line that has the output of the ```time``` command that I ran. The ```grep``` command includes the ```-n``` flag to identify line number(s) of search results.

# In[7]:


get_ipython().run_cell_magic('bash', '', 'grep -n real /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### Whoa! That's a LONG time! Let's try to pull the full time output.
# 
# #### Using ```head``` and ```tail``` to pull out a specific range of lines from the file. Making a rough guess...

# In[9]:


get_ipython().run_cell_magic('bash', '', 'head -1197020 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb | tail -20\n')


# #### So, to download all of the Ostrea lurida files, it took a little over 37hrs!

# #### Let's see what the time frame was on the Panopea generosa files was...

# In[10]:


get_ipython().run_cell_magic('bash', '', 'grep -n wget /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### Since the total number of lines in the file is 1197134, I'll just use the ```tail``` command to look at the last 100 lines (because the ```wget``` command for the Panopea generosa files is at line 1197080.

# In[11]:


get_ipython().run_cell_magic('bash', '', 'tail -100 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### Well, what we see (and should've realized when we ran ```grep -n real``` on line 7) is that there is no output from that ```wget``` command.
# 
# #### So, let's see if the files got downloaded or not...

# In[12]:


get_ipython().run_cell_magic('bash', '', 'ls -lhr /owl_web/P_generosa_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Panopea_generosa/\n')


# In[13]:


get_ipython().run_cell_magic('bash', '', 'ls -lhr /owl_web/P_generosa_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Panopea_generosa/clean_data/\n')


# #### OK, the files got downloaded. I'm guessing the enormous oupt from the Ostrea lurida ```wget``` command crashed the browser, but the notebook commands still proceeded to completion.

# ### Stripping cell output

# #### Use nbconvert to convert from "notebook" format to "notebook" format. A [Jupyter Google Group post provided the use of ```--ClearOutputPreprocessor.enabled=True```](https://groups.google.com/forum/#!topic/jupyter/z6ODiJ6VUzI) to strip output from cells.

# In[15]:


get_ipython().run_cell_magic('bash', '', 'jupyter nbconvert \\\n--to notebook \\\n/gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb \\\n--ClearOutputPreprocessor.enabled=True \\\n--output /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### Let's see if it worked by doing another line count on the notebook file

# In[16]:


get_ipython().run_cell_magic('bash', '', 'wc -l /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n')


# #### Indeed it did! Will get the notebook (and this notebook) pushed to GitHub!