Notebook

Introduction¶

Set tests are a powerful approach for association testing between groups of genetic variants and quantitative traits. In this tutorial we demonstrate how to use set tests within the LIMIX framework to test for association (mtSet).

mtSet¶

Multi Trait Set test is an implementation of efficient set test algorithms for testing for associations between multiple genetic variants and multiple traits. mtSet can account for confounding factors such as relatedness and can be used for analysis of single traits. mtSet can be used both with the command line interface using the limix scripts (mtSet_preprocess, mtSet_analyze, mtSet_postprocess, mtSet_simPheno) or within python.

Quick Start¶

In the following, we give a brief example on how to use mtSet. As a case study, we use a subset of the genotypes from the 1000 project [1] and simulated phenotypes.

All commands can be found in _demos/runmtSet.sh. In the following, we give a short summary of the individual steps. A demo for running mtSet-PC can be found in _demos/runmtSetPC.sh and it is not showcased here.

Our software depends on Plink version 1.9 (or greater) for preprocessing. Please, make sure you have it before proceeding.
Download and install Limix

git clone --depth 1 https://github.com/PMBio/limix.git
pushd limix
python setup.py install
popd

Download tutorial

git clone --depth 1 https://github.com/PMBio/limix-tutorials.git
cd limix-tutorials/mtSet
python download_examples.py
cd data
mkdir out
ls 1000g/

Set some handy shell variables

BFILE=1000g/chrom22_subsample20_maf0.10
CFILE=out/chrom22
PFILE=1000g/pheno
WFILE=out/windows
NFILE=out/null
WSIZE=30000
RESDIR=out/results
OUTFILE=out/final

Preprocess and phenotype simulation

# Kinship matrix estimation
mtSet_preprocess --compute_covariance --bfile $BFILE --cfile $CFILE 
# Fitting the null model and assigning the markers to windows
mtSet_preprocess --precompute_windows --fit_null --bfile $BFILE --cfile $CFILE --pfile $PFILE --wfile $WFILE --nfile $NFILE --window_size $WSIZE --plot_windows

Analysing true genotypes

mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100

Analysing permuted genotypes

for i in `seq 0 10`; do
    mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100 --perm $i
done

Postprocess

mtSet_postprocess --resdir $RESDIR --outfile $OUTFILE --manhattan_plot

Processing¶

Before getting started, we have to compute the sample-to-sample genetic covariance matrix, assign the markers to windows and estimate the trait-to-trait covariance matrices on the null model.

Computing the Covariance Matrix¶

The covariance matrix can be pre-computed as follows:

mtSet_preprocess --compute_covariance --plink_path plink_path  --bfile bfile  --cfile cfile

where

plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python covariance reader is employed. We strongly recommend using the plink reader for large datasets.
bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
cfile is the base name of the output file. The relatedness matrix will be written to cfile.cov while the identifiers of the individuals are written to the file cfile.cov.id. The eigenvalue decomposition of the matrix is saved in the files cfile.cov.eval (eigenvalues) and cfile.cov.evec (eigenvectors). If cfile is not specified, the files will be exported to the current directory with the following filenames bfile.cov, bfile.cov.id, bfile.cov.eval, bfile.cov.evec.

Precomputing the Principal Components¶

The principal components can be pre-computed as follows:

mtSet_preprocess --compute_PCs k --plink_path plink_path --ffile ffile  --bfile bfile

where

k is the number of top principal components that are saved
plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python genotype reader is employed. We strongly recommend using the plink reader for large datasets.
ffile is the name of the fixed effects file, to which the principal components are written to.
bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).

Fitting the null model¶

To efficiently apply mtSet, it is neccessary to compute the null model beforehand. This can be done with the following command:

mtSet_preprocess --fit_null --bfile bfile --cfile cfile --nfile nfile --pfile pfile --ffile ffile --trait_idx trait_idx

where

bfile is the base name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).
cfile is the base name of the covariance file and its eigen decomposition (cfile.cov, cfile.cov.eval and cfile.cov.evec).

If cfile is not set, the relatedness component is omitted from the model.

nfile is the base name of the output file.

The estimated parameters are saved in nfile.p0, the negative log likelihood ratio in nfile.nll0, the trait-to-trait genetic covariance matrix in nfile.cg0 and the trait-to-trait residual covariance matrix in nfile.cn0.

pfile is the base name of the phenotype file.
ffile is the name of the file containing the covariates. Each covariate is saved in one column
trait_idx can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance, --trait_idx 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero).

Notice that phenotypes are standardized prior to model fitting.

Precomputing the windows¶

For applying our set test, the markers have to be assigned to windows. We provide a method that splits the genome in windows of fixed sizes:

mtSet_preprocess --precompute_windows --bfile bfile --wfile wfile --window_size window_size --plot_windows

where

bfile is the base name of of the binary bed file (bfile.bim is required).
window_size is the size of the window (in basepairs). The default value is 30kb.
wfile is the base name of the output file. If not specified, the file is saved as bfile.window_size.wnd in the current folder. Each window is stored in one line having the following format: index, chromosome, start position, stop position, index of startposition and number of SNPs.
plot_windows if the flag is set, a histogram over the number of markers within a window is generated and saved as wfile.pdf.

Merging the preprocessing steps¶

Here, we provided the commands to execute the three preprocessing operations individually. However, it is also possible to combine all steps in a single command:

mtSet_preprocess --compute_covariance --fit_null --precompute_windows ...

Phenotype simulation¶

Our software package also includes a command-line simulator that allows to generate phenotypes with a wide range of different genetic architectures. In brief, the simulator assumes a linear-additive model, considering the contribution of a randomly selected (causal) genetic region for the set component, polygenic background effects from all remaining genome-wide variants, a contribution from unmeasured factors and iid observation noise. For a detailed description of the simulation procedure, we refer to the Supplementary Methods.

The simulator requires as input the genotypes and the relatedness component:

mtSet_simPheno --bfile bfile --cfile cfile --pfile pfile

where

bfile is the name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
cfile is the name of the covariance matrix file (cfile.cov, cfile.cov.id are required).

If none is specified, the covariance matrix is expected to be in the current folder, having the same filename as the bed file.

pfile is the name of the output file (pfile.phe, pfile.region).

The file pfile.phe contains the phenotypic values (each sample is saved in one row, each trait in one column). The file pfile.region contains the randomly selected causal region (chromsom, start position, end position). If pfile is not specified, the files are saved in the current folder having an automatic generated filename containing the bed filename and the values of all simulation parameters.

By changing the following parameters different genetic architectures can be simulated and, in particular, the simulation experiments of our paper can be reproduced.

Option	Default	Datatype	Explanation
`--seed`	0	int	seed for random number generator
`--nTraits`	4	int	number of simulated phenotypes
`--windowSize`	1.5e4	int	size of causal region
`--vTotR`	0.05	float	variance explained by the causal region
`--nCausalR`	10	int	number of causal variants in the region
`--pCommonR`	0.8	float	percentage of shared causal variants
`--vTotBg`	0.4	float	variance explained by the polygenic background effects
`--pHidden`	0.6	float	residual variance explained by hidden confounders (in %)
`--pCommon`	0.8	float	background and residual signal that is shared across traits (in %)
`--chrom`	None	int	specifies the chromosome of the causal region
`--minPos`	None	int	specifies the min. chromosomal position of the causal region (in basepairs)
`--maxPos`	None	int	specifies the max. chromosomal position of the causal region (in basepairs)

Running analysis¶

Once the preprocessing step has been used to obtain the genetic relatedness matrix, to fit the null model and to identify the genetic regions to be considered in the analysis, the set test can be run by the following analysis script:

mtSet_analyze --bfile bfile --cfile cfile --pfile pfile --nfile nfile --wfile wfile --ffile ffile --minSnps minSnps --start_wnd start_wnd --end_wnd end_wnd --resdir rdir --trait_idx traitIdx

where

bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
cfile is the base name of the covariance matrix file. The script requires the files: cfile.cov containing the the genetic relatedness matrix, cfile.cov.id containing the corresponding sample identifiers, cfile.cov.eval and cfile.cov.evec containing the eigenvalues and eigenvectors of the matrix. If cfile is not set, the relatedness component is omitted from the model.
pfile is the base name of the phenotype file. The script requires the file pfile.phe containing the phenotype data.
nfile is the base name of the null model file. The script requires the file nfile.p0 containing the optimal null model parameters. If covariates are set, it also requires the file nfile.f0.
wfile is the base name of the file containing the windows to be considered in the set test. The script requires the file wfile.wnd.
ffile is the name of the file containing the covariates. Each covariate is saved in one column.
perm is the seed used when permuting the genotypes. If the option is not specified then no permutation is considered.
start_wnd is the index of the start window
end_wnd is the index of the end window
minSnps if set only windows containing at least minSnps are considered in the analysis

rdir is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named - start_wnd_endwnd.res and contains results in the following format: window index, chromosome, start position, stop position, index of start position, number of SNPs and log likelihood ratio.

rdir is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named start_wnd_end_wnd.res and contains results in the following format: window index, chromosome, start position, stop position, index of startposition, number of SNPs and log likelihood ratio.
trait_idx can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance trait_idx 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero).

Notice that phenotypes are standardized prior to model fitting.

Postprocessing¶

After running mtSet, the following script can be used to merge the result files and estimate the p-values (p-values are obtained by a parametric fit of the test statistics):

mtSet_postprocess --resdir resdir --outfile outfile --manhattan_plot

where

resdir is a pointer to the folder containing the result files of the analysis.
outfile is the prefix of the two output files.

outfile.perm lists the test statistics (first column) and p-values (second column) of the permutated windows outfile.test contains the (index, chromosome, start position, stop position, SNP index, number of SNPs, test statistics and p-value) of each window. Each window is saved in one row.

manhattan_plot is a flag. If set, a manhattan plot is saved in outfile.manhattan.jpg (default: False).

Within Python¶

This part of the tutorial shows how to use mtSet within python. For a tutorial on how to use mtSet from the command line using the limix scripts (mtSet_preprocess, mtSet_analyze, mtSet_postprocess, mtSet_simPheno) please refer to sections Processing, Phenotype simulation, Running analysis, and Postprocessing.

Setting up¶

In [1]:

# activiate inline plotting
%matplotlib inline

from download_examples import get_1000G_mtSet
import scipy as sp
import scipy.linalg
import limix

In [2]:

# loading 1000G genotypes for mtSet demo
get_1000G_mtSet()

In [3]:

# base name for bed, bim and fam
bfile = './data/1000g/chrom22_subsample20_maf0.10'

Split genotypes into regions¶

In [4]:

from limix.mtSet.core import plink_reader

In [5]:

# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)

In [6]:

# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with mtSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument	Default	Datatype	Explanation
method	'slidingWindow'	str	Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size	5E+04 (50kb)	float	Window size. Pace is set at half the size of the window
minSnps	1	int	Windows with number of SNPs lower that this threshold are not considered
maxSnps	sp.inf	int	Windows with number of SNPs higher that this threshold are not considered
cache	False	bool	If true, it activates the caching
out_dir	'./cache'	str	outdir of the cache file
fname	None	str	Name of the file
rewrite	False	bool	If true and the cache file already exists, the cache file is overwritten

In [7]:

split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows

1380 windows

Apply¶

In this section we showcase how to construct the mtSet class that will then be used for the set test analysis. We showcase both the full mtSet that models relatedness as random effect by the means of an individual-to-individual covariance matrix and the approximated model (mtSetPC) that models relatedness as fixed effect using principal component from the covariance.

In [8]:

# import phenotype and sample relatedness
pheno_file = './data/1000g/pheno.phe'
sample_relatedness_file = './data/1000g/chrom22.cov'
Y = sp.loadtxt(pheno_file)
R = sp.loadtxt(sample_relatedness_file)

In [9]:

# compute eigenvalues and eigenvectors of the sample relatedness matrix
S_R, U_R = scipy.linalg.eigh(R) # these are needed for the full mtSet model

In [10]:

# caculate fixed effects with rel
F = U_R[:, ::-1][:, :10] # it considered the first 10 PCs
F = sp.concatenate([F, sp.ones((F.shape[0], 1))], 1) # add an intercept term

In [11]:

if 0:
    # use full mtSet implementation
    # (relatedness is modeled as random effect by the means of
    # an individual-to-individual covariance matrix)
    mtSet = limix.MTSet(Y, S_R=S_R, U_R=U_R)
else:
    # use mtSetPC
    # (relatedness s modelled as fixed effect
    # using principal component from the covariance)
    mtSet = limix.MTSet(Y, F=F)

Null model¶

If the analysis is parallelized across different set of regions and permutations, it might be convenient to cache the results from the optimization of the null model (as the null model need to be optimized only once).

Argument	Default	Datatype	Explanation
cache	False	bool	If true, it activates the caching
out_dir	'./cache'	str	outdir of the cache file
fname	None	str	Name of the file
rewrite	False	bool	If true and the cache file already exists, the cache file is overwritten

In [12]:

RV = mtSet.fitNull()

The returned dictionary contains:

B: value of the optimized effect sizes
Cg: value of the genetic trait-to-trait covariance
Cn: value of the residual trait-to-trait covariance
conv: bool that indicates convergence of the optimization
time: time elpased for optimizing the parameters
NLL0: negative log likelihood of the null model
LMLgrad: norm of the gradient of the negative log likelihood dividived by the number of parameters

Test¶

In [13]:

# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))

In [14]:

n_wnds = 100 # only hundred windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
for wnd_i in range(n_wnds):
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print('.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps))
    
    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']

    # multi trait set test fit
    RV = mtSet.optimize(Xr)
    LLR[wnd_i] = RV['LLR'][0]

.. window 0 - (22, 16025000-16075000) - 21 snps
.. window 1 - (22, 16050000-16100000) - 23 snps
.. window 2 - (22, 16125000-16175000) - 7 snps
.. window 3 - (22, 16225000-16275000) - 9 snps
.. window 4 - (22, 16250000-16300000) - 16 snps
.. window 5 - (22, 16275000-16325000) - 12 snps
.. window 6 - (22, 16325000-16375000) - 5 snps
.. window 7 - (22, 16350000-16400000) - 5 snps
.. window 8 - (22, 16475000-16525000) - 8 snps
.. window 9 - (22, 16500000-16550000) - 7 snps
.. window 10 - (22, 16550000-16600000) - 5 snps
.. window 11 - (22, 16600000-16650000) - 13 snps
.. window 12 - (22, 16625000-16675000) - 18 snps
.. window 13 - (22, 16650000-16700000) - 10 snps
.. window 14 - (22, 16825000-16875000) - 46 snps
.. window 15 - (22, 16850000-16900000) - 75 snps
.. window 16 - (22, 16875000-16925000) - 68 snps
.. window 17 - (22, 16900000-16950000) - 62 snps
.. window 18 - (22, 16925000-16975000) - 43 snps
.. window 19 - (22, 16950000-17000000) - 22 snps
.. window 20 - (22, 16975000-17025000) - 34 snps
.. window 21 - (22, 17000000-17050000) - 73 snps
.. window 22 - (22, 17025000-17075000) - 80 snps
.. window 23 - (22, 17050000-17100000) - 71 snps
.. window 24 - (22, 17075000-17125000) - 61 snps
.. window 25 - (22, 17100000-17150000) - 64 snps
.. window 26 - (22, 17125000-17175000) - 42 snps
.. window 27 - (22, 17150000-17200000) - 9 snps
.. window 28 - (22, 17175000-17225000) - 24 snps
.. window 29 - (22, 17200000-17250000) - 39 snps
.. window 30 - (22, 17225000-17275000) - 26 snps
.. window 31 - (22, 17250000-17300000) - 55 snps
.. window 32 - (22, 17275000-17325000) - 109 snps
.. window 33 - (22, 17300000-17350000) - 115 snps
.. window 34 - (22, 17325000-17375000) - 92 snps
.. window 35 - (22, 17350000-17400000) - 77 snps
.. window 36 - (22, 17375000-17425000) - 92 snps
.. window 37 - (22, 17400000-17450000) - 97 snps
.. window 38 - (22, 17425000-17475000) - 68 snps
.. window 39 - (22, 17450000-17500000) - 54 snps
.. window 40 - (22, 17475000-17525000) - 61 snps
.. window 41 - (22, 17500000-17550000) - 83 snps
.. window 42 - (22, 17525000-17575000) - 81 snps
.. window 43 - (22, 17550000-17600000) - 72 snps
.. window 44 - (22, 17575000-17625000) - 88 snps
.. window 45 - (22, 17600000-17650000) - 99 snps
.. window 46 - (22, 17625000-17675000) - 158 snps
.. window 47 - (22, 17650000-17700000) - 156 snps
.. window 48 - (22, 17675000-17725000) - 96 snps
.. window 49 - (22, 17700000-17750000) - 89 snps
.. window 50 - (22, 17725000-17775000) - 65 snps
.. window 51 - (22, 17750000-17800000) - 77 snps
.. window 52 - (22, 17775000-17825000) - 111 snps
.. window 53 - (22, 17800000-17850000) - 102 snps
.. window 54 - (22, 17825000-17875000) - 73 snps
.. window 55 - (22, 17850000-17900000) - 69 snps
.. window 56 - (22, 17875000-17925000) - 129 snps
.. window 57 - (22, 17900000-17950000) - 156 snps
.. window 58 - (22, 17925000-17975000) - 120 snps
.. window 59 - (22, 17950000-18000000) - 91 snps
.. window 60 - (22, 17975000-18025000) - 93 snps
.. window 61 - (22, 18000000-18050000) - 88 snps
.. window 62 - (22, 18025000-18075000) - 68 snps
.. window 63 - (22, 18050000-18100000) - 112 snps
.. window 64 - (22, 18075000-18125000) - 113 snps
.. window 65 - (22, 18100000-18150000) - 73 snps
.. window 66 - (22, 18125000-18175000) - 71 snps
.. window 67 - (22, 18150000-18200000) - 58 snps
.. window 68 - (22, 18175000-18225000) - 74 snps
.. window 69 - (22, 18200000-18250000) - 75 snps
.. window 70 - (22, 18225000-18275000) - 56 snps
.. window 71 - (22, 18250000-18300000) - 77 snps
.. window 72 - (22, 18275000-18325000) - 103 snps
.. window 73 - (22, 18300000-18350000) - 88 snps
.. window 74 - (22, 18325000-18375000) - 67 snps
.. window 75 - (22, 18350000-18400000) - 63 snps
.. window 76 - (22, 18375000-18425000) - 65 snps
.. window 77 - (22, 18400000-18450000) - 116 snps
.. window 78 - (22, 18425000-18475000) - 153 snps
.. window 79 - (22, 18450000-18500000) - 124 snps
.. window 80 - (22, 18475000-18525000) - 125 snps
.. window 81 - (22, 18500000-18550000) - 111 snps
.. window 82 - (22, 18525000-18575000) - 65 snps
.. window 83 - (22, 18550000-18600000) - 50 snps
.. window 84 - (22, 18575000-18625000) - 45 snps
.. window 85 - (22, 18600000-18650000) - 80 snps
.. window 86 - (22, 18625000-18675000) - 74 snps
.. window 87 - (22, 18650000-18700000) - 17 snps
.. window 88 - (22, 18675000-18725000) - 11 snps
.. window 89 - (22, 18700000-18750000) - 19 snps
.. window 90 - (22, 18725000-18775000) - 14 snps
.. window 91 - (22, 18750000-18800000) - 11 snps
.. window 92 - (22, 18775000-18825000) - 8 snps
.. window 93 - (22, 18800000-18850000) - 11 snps
.. window 94 - (22, 18825000-18875000) - 14 snps
.. window 95 - (22, 18850000-18900000) - 42 snps
.. window 96 - (22, 18875000-18925000) - 102 snps
.. window 97 - (22, 18900000-18950000) - 101 snps
.. window 98 - (22, 18925000-18975000) - 93 snps
.. window 99 - (22, 18950000-19000000) - 78 snps

The returned dictionary from mtSet.optimize contains:

Cr: value of the region-term trait-to-trait covariance
Cg: value of the genetic trait-to-trait covariance
Cn: value of the residual trait-to-trait covariance
variances: n_traits-by-3 matrix of variance explained by the three contributions (region, background, noise) for the traits
conv: bool that indicates convergence of the optimization
time: time elpased for optimizing the parameters
NLLAlt: negative log likelihood of the alternative model
LLR: test statistics
LMLgrad: norm of the gradient of the negative log likelihood dividived by the number of parameters

P-values¶

P values are obtained from a relatively small number of genome-wide permutations, fitting a parametric model to the null distribution. Here we showcase the permutation procedure by considering 10 permutations for the 10 regions analyzed.

In [15]:

n_perms = 10
LLR_null = [] # in this list test statistics from permutations will be stored
for perm_i in range(n_perms):
    
    #1. generate permutation
    print 'permutation %d' % perm_i
    sp.random.seed(perm_i)
    perm_idxs = sp.random.permutation(Y.shape[0])
    
    #2. scan on the 100 regions
    for wnd_i in range(n_wnds):
        wnd_pos = split.wnd_pos[wnd_i]
        nSnps = split.nSnps[wnd_i]
        idx_wnd_start = split.idx_wnd_start[wnd_i]
        Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
        Xr = Xr[perm_idxs, :] # permute samples in region term
        RV = mtSet.optimize(Xr)
        LLR_null.append(RV['LLR'][0])
LLR_null = sp.array(LLR_null)

permutation 0
permutation 1
permutation 2
permutation 3
permutation 4
permutation 5
permutation 6
permutation 7
permutation 8
permutation 9

The parametric fit to the ditribution of the test statistics under the null and the consequent conversion of the observed test statistics in P values is performed by the module limix.stats.chi2mixture as shown below.

In [16]:

from limix.stats.chi2mixture import Chi2mixture
c2m = Chi2mixture(tol=4e-3)
c2m.estimate_chi2mixture(LLR_null)
pv = c2m.sf(LLR)

In [17]:

#makes a manhattan plot
wnd_chrom = split.wnd_pos[:n_wnds,0]
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(pv), 'o', color='MidnightBlue')

Out[17]:

[<matplotlib.lines.Line2D at 0x115fe4fd0>]

Developers¶

Francesco Paolo Casale (casale@ebi.ac.uk)
Barbara Rakitsch (rakitsch@ebi.ac.uk)
Danilo Horta (horta@ebi.ac.uk)
Oliver Stegle (stegle@ebi.ac.uk)

References¶

[1] Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012).

In [1]:

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

Table of Contents

Introduction¶

mtSet¶

Quick Start¶

Processing¶

Computing the Covariance Matrix¶

Precomputing the Principal Components¶

Fitting the null model¶

Precomputing the windows¶

Merging the preprocessing steps¶

Phenotype simulation¶

Running analysis¶

Postprocessing¶

Within Python¶

Setting up¶

Split genotypes into regions¶

Apply¶

Null model¶

Test¶

P-values¶

Developers¶

References¶