Using iSet in Python

In this tutorial we show how to use iSet in Pyhton. The tutorial requires sample data at http://www.ebi.ac.uk/~casale/data.zip, which can be obtained running the following commands

wget http://www.ebi.ac.uk/~casale/data.zip
unzip data.zip

As we show in this tutorial, iSet can be applied for interaction analysis in two data designs:

  • complete design, where all individuals have been phenotyped in each context
  • stratified design, where each individual has been phenotyped in only one of the two contexts

Complete design

iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see first an application of iSet for analysis of complete designs, and then an application for the analysis of stratified design in the next section.

Setting up

In [4]:
# activiate inline plotting
%matplotlib inline

import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd
In [1]:
# base name for bed, bim and fam
bfile = 'data/chrom22_subsample20_maf0.10'

Split genotypes into regions

In [4]:
from limix.mtSet.core import plink_reader
In [5]:
# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)
In [6]:
# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with iSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument Default Datatype Explanation
method 'slidingWindow' str Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size 5E+04 (50kb) float Window size. Pace is set at half the size of the window
minSnps 1 int Windows with number of SNPs lower that this threshold are not considered
maxSnps sp.inf int Windows with number of SNPs higher that this threshold are not considered
cache False bool If true, it activates the caching
out_dir './cache' str outdir of the cache file
fname None str Name of the file
rewrite False bool If true and the cache file already exists, the cache file is overwritten
In [7]:
split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows
1380 windows

Apply

In [8]:
# import phenotype and sample relatedness
pheno_file = './data/pheno.phe'
sample_relatedness_file = './data/chrom22.cov'
sample_relatedness_file = './data/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:2]
R = sp.loadtxt(sample_relatedness_file)
In [9]:
# load fixed effect covariates (10 PCs + intercept term)
sp.loadtxt()
In [10]:
# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))
In [11]:
n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time

for wnd_i in range(n_wnds):
    t0 = time.time()
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)

    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
    xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
    idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
    Xr = Xr[:,idxs_u]
    Xr-= Xr.mean(0)
    Xr/= Xr.std(0)
    Xr/= sp.sqrt(Xr.shape[1])

    _df, _df0 = fit_iSet(Y, F=F, Xr=Xr, n_nulls=10)
    df  = df.append(_df)
    df0 = df0.append(_df0)
    print 'Elapsed:', time.time()-t0
.. window 0 - (22, 16025000-16075000) - 21 snps
/Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:127: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  SNPs = SP.zeros(((SP.ceil(0.25*N)*4),nSNPs),order=order)
/Users/casale/anaconda/lib/python2.7/site-packages/limix-0.8.0.dev0-py2.7-macosx-10.6-x86_64.egg/limix/mtSet/core/plink_reader.py:142: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bytes = SP.array(bytearray(f.read(nbyte))).reshape((SP.ceil(0.25*N),Sblock),order='F')
Elapsed: 11.3809649944
.. window 1 - (22, 16050000-16100000) - 23 snps
Elapsed: 12.6584079266
.. window 2 - (22, 16125000-16175000) - 7 snps
Elapsed: 16.8145720959
.. window 3 - (22, 16225000-16275000) - 9 snps
Elapsed: 104.873666048
.. window 4 - (22, 16250000-16300000) - 16 snps
Elapsed: 20.669064045
.. window 5 - (22, 16275000-16325000) - 12 snps
Elapsed: 14.1111199856
.. window 6 - (22, 16325000-16375000) - 5 snps
Elapsed: 12.9142489433
.. window 7 - (22, 16350000-16400000) - 5 snps
Elapsed: 13.7915680408
.. window 8 - (22, 16475000-16525000) - 8 snps
Elapsed: 16.0733628273
.. window 9 - (22, 16500000-16550000) - 7 snps
Elapsed: 19.4930050373

The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence

In [12]:
df
Out[12]:
Heterogeneity-GxC var Persistent Var Rescaling-GxC Var iSet LLR iSet-het LLR mtSet LLR
0 0.005392 0.010507 7.101712e-03 5.746295e-01 2.224021e-01 1.655404e+00
0 0.007778 0.010084 5.555618e-03 5.572166e-01 3.393949e-01 1.605843e+00
0 0.000050 0.000050 4.840134e-10 7.581852e-03 -8.268391e-07 -8.269062e-07
0 0.000050 0.006019 1.255343e-02 2.000499e+00 4.871453e-09 2.296836e+00
0 0.000050 0.004385 1.233389e-02 1.820602e+00 -8.327561e-11 1.820689e+00
0 0.000050 0.000050 3.900443e-12 6.427395e-07 -2.638672e-10 -3.454943e-10
0 0.000050 0.000935 2.033645e-04 1.718735e-02 1.829665e-08 1.718657e-02
0 0.000050 0.000050 2.224766e-13 7.225904e-03 1.202191e-07 -2.722800e-10
0 0.000827 0.003191 8.814909e-03 8.597470e-01 1.172438e-02 8.970728e-01
0 0.000050 0.001833 8.049454e-03 5.902992e-01 1.579309e-08 5.902869e-01

The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.

In [13]:
df0
Out[13]:
iSet LLR0 iSet-het LLR0 mtSet LLR0
0 1.555267e-02 -1.114967e-08 6.763972e-01
1 5.138475e-03 1.666820e-08 9.962754e-02
2 1.322256e+00 3.478742e-07 3.593482e-01
3 5.303700e-03 3.838863e-09 -1.591616e-12
4 2.649577e-02 1.236790e-07 4.307782e-01
5 2.371773e+00 1.979566e-08 -2.283286e-09
6 1.140588e-01 8.015625e-07 -3.249224e-07
7 7.300552e-05 -1.030742e-09 5.007758e-03
8 1.931133e-01 -8.095640e-10 -3.333867e-10
9 1.024240e-03 2.546921e-08 6.409358e-01
0 1.538946e+00 6.648315e-08 4.238569e-03
1 2.684850e-01 1.207556e-07 3.899477e-01
2 1.982851e-01 2.740084e-07 6.468065e-01
3 5.448258e-02 2.304423e-07 2.953999e-01
4 1.040587e-01 3.410205e-07 2.598717e-02
5 5.039921e-01 4.222368e-07 9.883369e-02
6 1.169442e-01 -1.964736e-09 4.058230e-01
7 4.981047e+00 -1.250402e-08 1.874179e-01
8 1.376885e-01 5.323929e-02 5.414452e-01
9 1.713601e-02 -9.514019e-08 -2.625995e-09
0 7.129975e-01 -7.318970e-07 -8.095083e-08
1 1.773877e-02 8.319699e-08 3.107571e-01
2 5.700531e-07 -1.459642e-08 -1.425803e-09
3 5.083909e-03 4.496542e-09 1.231549e-01
4 2.108239e-01 2.066857e-01 -2.949758e-08
5 2.454567e+00 8.748543e-09 -1.677336e-09
6 8.920570e-02 -2.443301e-09 1.382382e-01
7 3.982626e-01 1.240664e-09 3.098874e-01
8 6.941323e-01 3.221816e-07 6.301227e-02
9 5.363783e-03 1.982642e-09 1.342524e+00
... ... ... ...
0 6.824015e-03 -2.870593e-12 1.024917e-01
1 1.227777e-03 1.075570e-06 -8.662937e-11
2 2.025293e-02 3.847382e-07 5.108097e-01
3 8.013020e-01 8.046220e-08 3.585721e+00
4 7.454272e-03 5.885283e-10 -2.145043e-09
5 2.169429e-01 5.314251e-08 1.388684e-01
6 6.550467e-03 -4.160765e-08 6.302979e-01
7 1.268165e-01 -1.860755e-04 1.117286e+00
8 8.665353e-01 5.419619e-08 -1.076216e-09
9 1.980706e-03 8.179478e-08 7.312406e-01
0 1.366847e-03 1.222673e-07 -1.336459e-06
1 4.231934e-02 6.916301e-07 -1.545544e-08
2 6.360202e-01 6.561397e-07 2.556833e-01
3 3.579892e-01 1.243274e-06 -8.588472e-10
4 1.162855e-01 -4.903882e-10 1.918087e-01
5 2.841780e-01 -2.251682e-08 3.612482e-02
6 4.007141e-01 6.620144e-07 6.667211e-01
7 1.253339e+00 9.091066e-04 -2.502361e-09
8 3.956992e-03 8.407589e-07 -5.046115e-08
9 2.841871e-03 6.264202e-09 -5.674963e-09
0 8.987440e-02 -1.561793e-07 -5.309630e-09
1 1.802102e-06 1.770561e-07 5.294796e-02
2 1.271400e-02 2.876845e-09 2.589745e+00
3 1.451155e-02 -2.205797e-07 6.947324e-01
4 1.653160e+00 7.874888e-09 2.198925e+00
5 2.299080e-01 9.061552e-08 9.677878e-01
6 2.180528e-03 3.503377e-07 2.677990e-01
7 6.993240e-03 -9.168471e-08 3.555292e-02
8 3.878006e-01 -3.444711e-11 6.384556e-03
9 7.838907e-03 1.011358e-08 -8.930101e-11

100 rows × 3 columns

Variance Component Plot

In [14]:
import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')
Out[14]:
<matplotlib.collections.PolyCollection at 0x115777490>

P-values

Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all conisdered steps.

In [15]:
from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
    df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)
In [16]:
#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')
Out[16]:
[<matplotlib.lines.Line2D at 0x1163e2450>]

Stratified design

iSet is an extension of mtSet that allows to test for polygenic interactions with environment or other contexts. iSet can be applied in designs where all individuals have been phenotyped in each context (complete design) as well as for populations that have been stratified by a context variable (stratified design). We will here see application of iSet for analysis of stratified designs.

Setting up

In [18]:
# activiate inline plotting
%matplotlib inline

from download_examples import get_1000G_mtSet
import scipy as sp
import scipy.linalg
import limix
from limix.iSet.iset import fit_iSet
import pandas as pd
In [19]:
# loading 1000G genotypes for mtSet demo
get_1000G_mtSet()
In [20]:
# base name for bed, bim and fam
bfile = './data/1000g/chrom22_subsample20_maf0.10'

Split genotypes into regions

In [21]:
from limix.mtSet.core import plink_reader
In [22]:
# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)
In [23]:
# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with iSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument Default Datatype Explanation
method 'slidingWindow' str Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size 5E+04 (50kb) float Window size. Pace is set at half the size of the window
minSnps 1 int Windows with number of SNPs lower that this threshold are not considered
maxSnps sp.inf int Windows with number of SNPs higher that this threshold are not considered
cache False bool If true, it activates the caching
out_dir './cache' str outdir of the cache file
fname None str Name of the file
rewrite False bool If true and the cache file already exists, the cache file is overwritten
In [24]:
split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows
1380 windows

Apply

In [25]:
# import phenotype and sample relatedness
pheno_file = './data/1000g/pheno.phe'
sample_relatedness_file = './data/1000g/chrom22.cov'
Y = sp.loadtxt(pheno_file)[:,:1]
R = sp.loadtxt(sample_relatedness_file)
print(Y.shape)
(274, 1)
In [26]:
# let's suppose the first half of the individuals are phenotyped in context A and
# the second half on context B
Ie = sp.arange(R.shape[0])<0.5*R.shape[0]
In [27]:
# corrects for population structure using the first 10 PCs of the relatedness matrix
S_R, U_R = scipy.linalg.eigh(R+1e-4*sp.eye(R.shape[0])) # these are needed for the full mtSet model
F = sp.concatenate([U_R[:,-10:], sp.ones([U_R.shape[0], 1])], 1)
In [28]:
# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))
In [29]:
n_wnds = 10 # only 10 windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
df = pd.DataFrame()
df0 = pd.DataFrame()
import time

for wnd_i in range(n_wnds):
    t0 = time.time()
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print '.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps)

    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
    xr = sp.dot(sp.rand(Xr.shape[0]), Xr)
    idxs_u = sp.sort(sp.unique(xr, return_index=True)[1])
    Xr = Xr[:,idxs_u]
    Xr-= Xr.mean(0)
    Xr/= Xr.std(0)
    Xr/= sp.sqrt(Xr.shape[1])

    _df, _df0 = fit_iSet(Y, F=F, Xr=Xr, Ie=Ie, n_nulls=10)
    df  = df.append(_df)
    df0 = df0.append(_df0)
    print 'Elapsed:', time.time()-t0
.. window 0 - (22, 16025000-16075000) - 21 snps
Elapsed: 5.46386289597
.. window 1 - (22, 16050000-16100000) - 23 snps
Elapsed: 5.34179997444
.. window 2 - (22, 16125000-16175000) - 7 snps
Elapsed: 4.30499100685
.. window 3 - (22, 16225000-16275000) - 9 snps
Elapsed: 5.6210091114
.. window 4 - (22, 16250000-16300000) - 16 snps
Elapsed: 5.8319299221
.. window 5 - (22, 16275000-16325000) - 12 snps
Elapsed: 5.72252416611
.. window 6 - (22, 16325000-16375000) - 5 snps
Elapsed: 5.94124102592
.. window 7 - (22, 16350000-16400000) - 5 snps
Elapsed: 4.80616307259
.. window 8 - (22, 16475000-16525000) - 8 snps
Elapsed: 4.15918207169
.. window 9 - (22, 16500000-16550000) - 7 snps
Elapsed: 4.45016479492

The dataframe df contains log likelihood ratio scores for the different models, variance component attributable to persistent rescaling-GxC and heterogeneity-GxC and information about convergence.

In [30]:
df
Out[30]:
Heterogeneity-GxC var Persistent Var Rescaling-GxC Var iSet LLR iSet-het LLR mtSet LLR
0 -0.025807 0.059489 1.620445e-02 2.536120e-01 1.431980e-01 1.253018e+00
0 -0.017351 0.051698 8.162345e-03 1.151319e-01 9.235339e-02 1.046717e+00
0 -0.000100 0.000100 4.437414e-11 1.728208e-07 -4.211657e-09 -4.237307e-09
0 -0.017640 0.002922 3.255356e-02 5.018203e-01 1.145868e-08 5.018202e-01
0 -0.003076 0.001255 4.877466e-03 2.303760e-02 -1.327507e-09 2.301929e-02
0 -0.000100 0.000100 7.126069e-14 7.600009e-07 2.849561e-10 -2.643219e-12
0 -0.013473 0.013947 1.353159e-02 3.469951e-01 -2.905769e-04 3.541518e-01
0 -0.000100 0.000100 1.501852e-07 4.185099e-07 -6.807419e-07 -6.841037e-07
0 -0.001059 0.050339 7.098318e-04 1.430663e+00 5.423559e-01 2.143354e+00
0 -0.014258 0.025472 1.755192e-02 9.496528e-01 5.507864e-02 9.827788e-01

The dataframe df0 contains log likelihood ratios when data are from the null. These are necessary to iSet to calculate P values.

In [31]:
df0
Out[31]:
iSet LLR0 iSet-het LLR0 mtSet LLR0
0 8.701465e-01 4.699373e-09 7.641269e-01
1 1.326539e+00 1.775959e-08 7.268617e-01
2 1.682668e-01 4.048806e-08 4.109243e-01
3 6.993162e-01 3.979626e-02 6.094640e-02
4 4.454416e-01 9.676313e-09 1.360218e+00
5 3.325469e-01 2.781690e-08 -2.262937e-10
6 3.882787e-03 3.794824e-08 -1.680718e-09
7 1.536725e+00 4.458499e-08 -1.468123e-09
8 4.738676e-02 1.543978e-08 -2.570744e-11
9 1.615154e+00 4.746059e-08 1.436369e+00
0 2.896440e+00 -3.513286e-07 7.101769e-01
1 5.469898e-01 1.827495e-08 -2.736243e-09
2 6.700332e-02 1.055858e-01 -1.757883e-11
3 9.861550e-02 3.519875e-08 -1.529388e-07
4 6.305888e-01 3.242017e-08 4.032068e-01
5 1.184665e+00 -7.830135e-08 9.479611e-01
6 7.165629e-01 1.139197e-02 -9.854688e-09
7 5.446231e-02 2.438270e-09 1.109642e-01
8 3.084690e-01 5.544223e-08 -3.404310e-09
9 8.050786e-02 1.705288e-09 1.399125e-02
0 6.840853e-04 5.868465e-01 3.604761e-01
1 9.845385e-08 1.434927e-09 1.186774e-02
2 1.190925e-03 -2.374596e-07 -5.115908e-13
3 2.072626e+00 1.207957e-08 1.874106e-03
4 2.848547e-01 -2.602064e-09 6.681083e-01
5 5.402823e-01 -2.528708e-09 -6.086225e-10
6 7.867916e-01 3.331607e-09 8.866508e-02
7 3.144970e-02 4.968342e-09 -6.268286e-08
8 9.240636e-06 6.960661e-09 -3.770140e-11
9 1.408660e-06 8.478040e-09 5.024917e-01
... ... ... ...
0 3.618376e-01 1.625580e-10 -9.757457e-10
1 1.296286e-02 -9.717652e-09 -3.811223e-09
2 2.637526e-01 -8.319745e-10 1.760752e-01
3 5.198513e-01 3.027425e-08 8.526513e-14
4 2.249842e-01 -1.295462e-09 -2.532204e-09
5 3.049868e-06 -1.456584e-09 7.744523e-01
6 1.695221e-01 8.543087e-08 2.239574e-03
7 7.297701e-01 1.763445e-08 1.270094e+00
8 2.980555e-01 7.361109e-09 1.076726e-02
9 6.107584e-01 1.563791e-09 5.718380e-01
0 3.746989e-01 8.190568e-10 -2.868344e-08
1 2.557176e-04 -9.237351e-08 -3.477284e-07
2 9.510620e-01 -7.084111e-11 8.870894e-02
3 7.999111e-02 1.019976e-08 4.138307e-01
4 3.914442e-02 1.043787e-08 -2.800391e-10
5 1.115584e+00 1.265378e+00 2.756960e-04
6 6.104677e-02 5.224194e-09 1.050632e-01
7 1.526689e+00 6.251213e-10 5.871766e-02
8 4.908732e-05 -6.544099e-11 1.440395e-01
9 1.342923e-01 6.910514e-08 1.195091e+00
0 2.524510e-01 1.705915e+00 7.643556e-03
1 2.279359e-01 2.201907e-08 1.626311e+00
2 2.878298e-01 1.038200e+00 1.202775e-01
3 1.121573e-06 3.969859e-09 4.472819e-01
4 1.029227e-01 -6.920686e-10 -2.700062e-13
5 1.121706e-05 1.542335e-07 6.712342e-01
6 1.083405e-01 3.995524e-10 6.778384e-02
7 7.616543e-01 1.344246e-08 7.651055e-01
8 2.134231e-02 9.555620e-09 3.038097e-01
9 3.509122e-02 3.415337e-09 1.191820e+00

100 rows × 3 columns

Variance Component Plot

In [32]:
import pylab as pl
tot_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values + df['Heterogeneity-GxC var'].values
nohet_var = df['Persistent Var'].values + df['Rescaling-GxC Var'].values
pl.fill_between(split.wnd_pos[:n_wnds,1], 0, df['Persistent Var'].values, color='gray')
pl.fill_between(split.wnd_pos[:n_wnds,1], df['Persistent Var'].values, nohet_var, color='Orange')
pl.fill_between(split.wnd_pos[:n_wnds,1], nohet_var, tot_var, color='Gold')
Out[32]:
<matplotlib.collections.PolyCollection at 0x11aea0ad0>

P-values

Empirical P values are obtained from a relatively small number of genome-wide permutations by pooling across all considered steps.

In [33]:
from limix.mtSet.core.iset_utils import calc_emp_pv_eff
#calculate P values for the three tests
for test in ['mtSet', 'iSet', 'iSet-het']:
    df[test+' pv'] = calc_emp_pv_eff(df[test+' LLR'].values, df0[test+' LLR0'].values)
In [34]:
#makes a manhattan plot
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(df['mtSet pv'].values), 'o', color='DarkGreen')
pl.plot(wnd_start, -sp.log10(df['iSet pv'].values), 'o', color='DarkRed')
pl.plot(wnd_start, -sp.log10(df['iSet-het pv'].values), 'o', color='Gold')
Out[34]:
[<matplotlib.lines.Line2D at 0x115bbbb90>]

Developers