demonstration of plot_expression_across_chromosomes.py script

Based on here.

If you'd like an active Jupyter session to run this notebook, launch one by clicking here, and then upload this notebook to the session that starts up.
Otherwise, the static version is rendered more nicely via here.


If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.

Some tips:

  • Code cells have boxes around them. When you hover over them a icon appears.
  • To run a code cell either click the icon, or click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
  • While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
  • In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
  • To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.


You'll need the current version of the script to run this notebook, and the next cell will get that. (Remember if you want to make things more reproducible when you use the script with your own data, you'll want to edit calls such as this to fetch a specific version of the script. How to do this is touched upon in the comment below here.

In [1]:
!curl -O https://raw.githubusercontent.com/fomightez/sequencework/master/plot_expression_across_chromosomes/plot_expression_across_chromosomes.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 48390  100 48390    0     0   256k      0 --:--:-- --:--:-- --:--:--  256k

Display Usage / Help Block

In [2]:
%run plot_expression_across_chromosomes -h
usage: plot_expression_across_chromosomes.py [-h] [-cols COLUMNS] [-l]
                                             [-chr CHRS] [-nl] [-nlim] [-s]
                                             [-ed EXP_DESIG] [-bd BASE_DESIG]
                                             [-svg] [-ndh] [-ac ADVANCE_COLOR]
                                             ANNOTATION_FILE DATA_FILE

plot_expression_across_chromosomes.py plots a ratio of expression values
across chromosomes or scaffolds of a genome to highlight regions of deviation.
Besides the options listed here, there are several `USER ADJUSTABLE VALUES`
inside the script that can be edited for easy customization. A similar plot is
called a Manhattan plot and this implementation borrows the plotting approach
and some of the features from Brent Pedersen's awesome `manhattan-plot.py`
script. **** Script by Wayne Decatur (fomightez @ github) ***

positional arguments:
  ANNOTATION_FILE       Name of file containing the genome annotation.
                        REQUIRED. This is needed to determine the order of
                        individual data points along the chromosome and how to
                        display the data across chromosomes or scaffolds.
  DATA_FILE             Name of file containing the summarized data to plot,
                        such as mean TPM or RPKM, etc. in tab-delimited form.
                        REQUIRED. See my script
                        `plot_expression_across_chromosomes_from_raw.py` if
                        you want supply the individual `raw` data files with
                        the level metric for each sample and/or replicate.

optional arguments:
  -h, --help            show this help message and exit
  -cols COLUMNS, --columns COLUMNS
                        columns for gene, wild-type (baseline state)
                        expression value, experimental condition expression
                        value, in that order. This flag is used to specify the
                        data in the summary file to be plotted. Separate the
                        column identifiers by commas, without spaces. Default
                        is `1,2,3`, where `1` indicates the first column,
                        i.e., how you'd refer to the columns in natural
                        language (no zero-indexing).
  -l, --lines           add this flag to plot the expression level ratio value
                        as lines extending from the x-axis rather than points
                        in space. (The resulting aesthetic may resemble a city
                        skyline for which the `manhattan plot` is named.)
  -chr CHRS, --chrs CHRS
                        use this flag to limit plotting of the data to
                        particular chromosomes or scaffolds you specify
                        immediately following this flag. Separate the
                        chromosome or scaffold identifiers by commas, without
                        spaces. Example use in a command is `--chrs I,IV,XVI`.
                        Default when this optional flag is not called is to
                        plot that data for all chromosomes or scaffolds.
  -nl, --no_log         add this flag to keep the expression level ratio to be
                        plotted in the common base 10 instead of converting to
                        log2.
  -nlim, --no_limits    add this flag to not impose a limit of above and below
                        4 in plot window when converting to log2. The cutoff
                        can also be adjusted under `user-adjustable settings`
                        in the script. Issuing this flag has no effect if all
                        values are within +/- the cutoff interval or
                        `--no_log` is used.
  -s, --smooth          add this flag to display a smoothing curve fit to the
                        data points (LOWESS) on a per chromosome basis. This
                        option can enhance visualization of deviations
                        characteristic of aneuploidy and copy number variation
                        across the genome, both within and between
                        chromosomes. Additionally, a simplistically-based
                        assesment will be made for aneuploidy at the
                        chromosome or scaffold level and a notice will be made
                        as the program is running if aneuploidy at the
                        chromosome or scaffold level seems indicated by this
                        simple metric. Further examination is warranted
                        regardless of the result this automated assessment.
  -ed EXP_DESIG, --exp_desig EXP_DESIG
                        Allows changing the text used in y-axis label to
                        reference experimental sample. Following `--exp_desig`
                        type what you'd like to read there instead of
                        `experimental`.
  -bd BASE_DESIG, --base_desig BASE_DESIG
                        Allows changing the text used in y-axis label to
                        reference wild-type or baseline sample. Following
                        `--base_desig` type what you'd like to read there
                        instead of `wild-type`.
  -svg, --save_vg       add this flag to save as vector graphics
                        (**RECOMMENDED FOR PUBLICATION***) instead of default
                        png. Not default or saved alongside default because
                        file size can get large due to the large number of
                        points.
  -ndh, --no_data_header
                        add this flag if there is no data header or no first
                        line of column names in the data file. Otherwise, it
                        is assumed there is and any item read as the first
                        gene identifier from the first line won't be
                        highlighted as missing from annotation. IMPORTANTLY,
                        this only affects feedback provided as script is run.
                        If the first line resembles data, i.e., numbers in
                        specified columns, it will be automagically parsed as
                        if data. Remove the header or column labels line from
                        your summary data file on the off-chance this causes
                        issues in your resulting plot.
  -ac ADVANCE_COLOR, --advance_color ADVANCE_COLOR
                        **FOR ADVANCED USE.*** Allows for advancing the color
                        selection iterator the specified number of times. The
                        idea is it allows the ability to control the color of
                        the chromosome when specifying a chromosome or
                        scaffolds to plot so you could make the color match
                        the one used when all chromsome plotted if needed.
                        Supply the number to advance after the flag on the
                        command line. For example, `-ac 4`.

Preparing for use of the script

Get data/input files.

Along with expression data, information on the genes' names and locations in form of GTF or GFF is needed to use the plot_expression_across_chromosomes.py script. (BED could be possible but not developed as of yet.)

For convenience, we will generate use a different script to generate mock expression data to be used in this demo. If you are curious of the output produced with real biological data, then you can examine the plots here because the present version of the documentation page for plot_expression_across_chromosomes.py only features resuls from actual yeast data.)

Get GTF input files

For the demo notebook, both yeast (S. cerevisiae) and human genetic information will be used.

Both of these genetic information files require FTP use for retrieval and that is not allowed using a Binder session, and so they have been baked into the image that gets spun up when the Jupyter session is launched from here. Use the commands in the cell below to move them into the location of the active analysis. In the case of the human data there is also an unpacking step as it is a compressed archive.

The yeast GTF file retrieval is described here.

The human GTF data is from Ensembl. To get the specific version asked about here I adjusted the URL obtained from link in the question here. Adjust the links and commands accordingly to match your favorite version.

In [3]:
#!curl -OL ftp://ftp.ensembl.org/pub/release-88/gtf/homo_sapiens/Homo_sapiens.GRCh38.88.chr.gtf.gz
# source: ftp://ftp.ensembl.org/pub/release-88/gtf/homo_sapiens/ , to try to address https://github.com/fomightez/sequencework/issues/1
!gunzip -k ../data/Homo_sapiens.GRCh38.88.chr.gtf.gz
!mv ../data/Homo_sapiens.GRCh38.88.chr.gtf .
!mv ../data/R64-1-1genes.gtf .

You should now be able to see Homo_sapiens.GRCh38.88.chr.gtf and R64-1-1genes.gtf listed among the files in the current working directory when you run the next cell.

In [4]:
ls
'demo get_specified_length_of_end_of_seq_from_FASTA.ipynb'
'Demo of script to get sequence from multiFASTA file when description contains matching text.ipynb'
'Demo of script to get specific subsequence from FASTA file.ipynb'
'demo plot_expression_across_chromosomes.ipynb'
 GSD/
 Homo_sapiens.GRCh38.88.chr.gtf
 plot_expression_across_chromosomes.py
 R64-1-1genes.gtf
'Use biopython to make valid CLUSTAL formatted MSAs, check sequence of manually edited alignment, and add consensus line.ipynb'

Making mock data

Get the script for generating mock data.

In [5]:
!curl -OL https://raw.githubusercontent.com/fomightez/simulated_data/master/gene_expression/mock_expression_ratio_generator.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30725  100 30725    0     0   182k      0 --:--:-- --:--:-- --:--:--  182k

This script can use a GTF or GFF file to extract a list of genes and produce crude simulated expression data for them.

It has two advanced features, both of which this demo will use. They are described briefly here; however, you are referred to the official documentation for the script for more information.

The two features to be taken advatange of here are:

  • the ability to specify chromosomes or regions of chromosomes that don't have a expression ratio that averages around 1:1 for baseline vs. experiemental.

  • the mock data generator script can be supplied an optional flag to only use a fraction of the genes. This is helpful when dealing with large genetic information files, like the 1.4 Gb human file. This is simply for convenience here to make the demonstration steps occur in a timely manner. (You would not want to only use a fraction of your own data.) Although for now, we will holding off on generating the human data as it can take some time,

The next cells will generate the mock data in tab-separated tabular text form. We'll do it for yeast alone for now since dealing with the human genetic data is on a much larger scale.

In [6]:
%run mock_expression_ratio_generator.py R64-1-1genes.gtf
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...
Filling in the mock values for each gene...
Mock data saved as: R64-1-1genes_mock_expression_ratios.tsv

To make things easier, we'll rename the file to something more convenient by running the next cell.

In [7]:
!mv R64-1-1genes_mock_expression_ratios.tsv sc_data.tsv #yeast S. cerevisiae

Finally we are ready to use the script.

Basic use examples set #1: Using from the command line (or equivalent / similar)

Run the script

We'll use some yeast data for the first few examples. Human will follow.

In [8]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 38 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr.png

In the above cell and elsewhere in this notebook, %%bash cell magic is used to send this to the shell to run as if on the command line.

You could simply run something like python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv if you are working on the command line directly. In fact, the terminal is available from the Jupyter dashboard (or from the JupyterLab launcher) and you can feel free to try running the command below in a terminal in this Jupyter session if you'd like. (The Jupyter dashboard is available in the classic notebook mode by clicking on the Jupyter logo int he upper left corner. If your are in a JupyterLab session, you should have a file navigator panel available and you can click there.)

python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv

The script ran because the only things essential are to indicate the genetic information file and the expression data file. As the Usage block describes, the script defaults to thinking the first three columns of your data are your gene names and expression levels. You can specify them using the --columns option like the command below. Run that and you seel it works. In fact, it produces the same results.

In [9]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv --columns 1,2,3
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 38 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr.png

We'll look at the plot produced in a minute, but first let's expand more on the --columns option.

The --columns option is particularly important if your results are differently arranged. You'd change the provided numbers to match your scheme. For example, --columns 1,9,4 is what I typically used with DESeq2-generated data as described here and here.

Here is an exmaple of trying that. However, note that it will throw an error because there is no data matching that column.

In [10]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf data.tsv --columns 1,9,4
usage: plot_expression_across_chromosomes.py [-h] [-cols COLUMNS] [-l]
                                             [-chr CHRS] [-nl] [-nlim] [-s]
                                             [-ed EXP_DESIG] [-bd BASE_DESIG]
                                             [-svg] [-ndh] [-ac ADVANCE_COLOR]
                                             ANNOTATION_FILE DATA_FILE
plot_expression_across_chromosomes.py: error: argument DATA_FILE: can't open 'data.tsv': [Errno 2] No such file or directory: 'data.tsv'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-10-e9bb614f189b> in <module>
----> 1 get_ipython().run_cell_magic('bash', '', 'python plot_expression_across_chromosomes.py R64-1-1genes.gtf data.tsv --columns 1,9,4\n')

/srv/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2321             magic_arg_s = self.var_expand(line, stack_depth)
   2322             with self.builtin_trap:
-> 2323                 result = fn(magic_arg_s, cell)
   2324             return result
   2325 

/srv/conda/lib/python3.6/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
    140             else:
    141                 line = script
--> 142             return self.shebang(line, cell)
    143 
    144         # write a basic docstring:

</srv/conda/lib/python3.6/site-packages/decorator.py:decorator-gen-109> in shebang(self, line, cell)

/srv/conda/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/srv/conda/lib/python3.6/site-packages/IPython/core/magics/script.py in shebang(self, line, cell)
    243             sys.stderr.flush()
    244         if args.raise_error and p.returncode!=0:
--> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
    246 
    247     def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'python plot_expression_across_chromosomes.py R64-1-1genes.gtf data.tsv --columns 1,9,4\n'' returned non-zero exit status 2.

So always make sure your data matches what you are indicating are values to use. If you see something like above, that is the first thing to check. While we are broaching the subject of debugging, I'd like to point out that this Jupyter environment is set up to empower you to more easily test things are working. It gets the current version of the script and uses it. And so if things are working here and your data is set up correct, through some tinkering you should be able to get things working. Keep in mind that the GTF files and the data files can be large. You may wish to make sure you aren't exceeding the compuational/memory capacity here (or where you are running this script with your data) by using a fraction of your data to test. Remember, the mock_expression_ratio_generator.py allows youto generate mock data from a fraction of your genes.

Let's look at the plot we generated earlier using the yeast data.

You could use the Jupyter dashboard or file navigator to find the file sc_data_across_chr.png and open it. In fact, if you wanted to download a 'real' result made in a Jupyter environment, you'd also use that route but use the 'download' options. For convenience / encapsulation, we'll use the following code to rename the file from earlier so that it won't be clobbered below and to view the renamed image file directly in this notebook.

In [11]:
!mv sc_data_across_chr.png early_sc_plot.png
from IPython.display import Image
Image("early_sc_plot.png")
Out[11]:

Besides the --columns option for designating which columns in the tab-separated input file hold the data, the script has other useful options and the rest of the notebook will be focused on illustrating those while also demonstrating using the script with the large (1.4 Gb) human GTF file.

First up will be using options to adjust the default y-axis labels.

Specifying text to use in y-axis labels.

As can be seen in the above image, the script defaults to putting experimental and wild-type as text in the labels of the y-axis in the plot. Using a couple of options, you can control the text used for both of these. The option --base_desig followed by the desired word to use can be used to specify the baseline/ wild-type designation to show in the plot. Similary, for the experimental text using --exp_desig followed by text to use in its place.

The next two cells will run that and then show it in an approach like used earlier. We'll use 'KO' for the knockout and 'WT' for the other. (Note that we also use the shorter form of the --columns flag to illustrate shorter forms exist for most flags as detailed in the USAGE block.)

In [12]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv -cols 1,2,3 --exp_desig KO --base_desig WT
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 38 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr.png
In [13]:
!mv sc_data_across_chr.png designation_example.png
from IPython.display import Image
Image("designation_example.png")
Out[13]:

This next examples shows symbols like the delta sign used for deletions can be used in the y-axis text.

In [14]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv --columns 1,2,3 --exp_desig ΔYDR190C --base_desig WT
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 38 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr.png
In [15]:
!mv sc_data_across_chr.png symbol_example.png
from IPython.display import Image
Image("symbol_example.png")
Out[15]:

Illustrating the use of 'lines' from the origin and limiting to certain chromosomes

Using the same input files you can limit the plot to specific chromosomes. The chromosomes can be specified as one or a few of them separated by a comma after the --chrs option flag.

Additionally, instead of points, you can specify with the --lines flag that the data points be lines from zero to give more of skyline reflected in water aesthetic to the Manhattan plot-like nature of these plots. (The zero indicator being the waterline.) (This aesthetic came from seeing the lines option in Brent Pederson's Manhattan plot script.)

The next two plots will utilize those options.

In [16]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv --columns 1,2,3 --chrs II,III,IV,V --lines --exp_desig ΔYDR190C --base_desig WT
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 8 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr_II_III_IV_V.png
In [17]:
!mv sc_data_across_chr_II_III_IV_V.png few_chrs_example.png
from IPython.display import Image
Image("few_chrs_example.png")
Out[17]:
In [18]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv --columns 1,2,3 --chrs XVI --exp_desig ΔYDR190C --base_desig WT
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 2 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr_XVI.png
In [19]:
!mv sc_data_across_chr_XVI.png one_chr_example.png
from IPython.display import Image
Image("one_chr_example.png")
Out[19]:

There are a couple things to note.

  • The name of the plot image files gets appended with the chromosome designations automatically to aid in organization of files.
  • The color of chromosome XVI in single chromosome plot doesn't match the one in the plot with all the chromosomes. We can address this for the case of the single chromosomes using the --advance_color flag.

Illustrating the use of 'advance_color' flag in conjunction with limiting to one chromosome

We can tell the script not to settle for the first color in the color series, and instead advance 15 more times beyond the first in the series to match what is seen for chromosome XVI when the entire genome is shown.

In [20]:
%%bash
python plot_expression_across_chromosomes.py R64-1-1genes.gtf sc_data.tsv --columns 1,2,3 --chrs XVI --advance_color 15 --exp_desig ΔYDR190C --base_desig WT
Reading annotation file and getting data on genes and chromosomes...Information for 7126 genes parsed...The chromosomes appear to be in roman numeral form....Parsing data file...7127 lines of data parsed and relative levels for 7126 genes recorded.../srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')
/srv/conda/lib/python3.6/site-packages/matplotlib/axes/_base.py:3610: MatplotlibDeprecationWarning: 
The `ymax` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `top` instead.
  alternative='`top`', obj_type='argument')

***Notice:***The 2 points beyond the bounds of y-axis are drawn as open triangles at the edge; a limit was imposed to avoid extreme values compressing the typically important range; run with `--no_limits` or `--no_log` to see these accurately.

Plot image saved to: sc_data_across_chr_XVI.png
In [21]:
!mv sc_data_across_chr_XVI.png advancing_color_example.png
from IPython.display import Image
Image("advancing_color_example.png")
Out[21]: