Table of Contents

Introduction

Set tests are a powerful approach for association testing between groups of genetic variants and quantitative traits. In this tutorial we demonstrate how to use set tests within the LIMIX framework to test for association (mtSet).

mtSet

Multi Trait Set test is an implementation of efficient set test algorithms for testing for associations between multiple genetic variants and multiple traits. mtSet can account for confounding factors such as relatedness and can be used for analysis of single traits. mtSet can be used both with the command line interface using the limix scripts (mtSet_preprocess, mtSet_analyze, mtSet_postprocess, mtSet_simPheno) or within python.

Quick Start

In the following, we give a brief example on how to use mtSet. As a case study, we use a subset of the genotypes from the 1000 project [1] and simulated phenotypes.

All commands can be found in _demos/runmtSet.sh. In the following, we give a short summary of the individual steps. A demo for running mtSet-PC can be found in _demos/runmtSetPC.sh and it is not showcased here.

  1. Our software depends on Plink version 1.9 (or greater) for preprocessing. Please, make sure you have it before proceeding.

  2. Download and install Limix

    git clone --depth 1 https://github.com/PMBio/limix.git
    pushd limix
    python setup.py install
    popd
    
  3. Download tutorial

    git clone --depth 1 https://github.com/PMBio/limix-tutorials.git
    cd limix-tutorials/mtSet
    python download_examples.py
    cd data
    mkdir out
    ls 1000g/
    
  4. Set some handy shell variables

    BFILE=1000g/chrom22_subsample20_maf0.10
    CFILE=out/chrom22
    PFILE=1000g/pheno
    WFILE=out/windows
    NFILE=out/null
    WSIZE=30000
    RESDIR=out/results
    OUTFILE=out/final
    
  5. Preprocess and phenotype simulation

    # Kinship matrix estimation
    mtSet_preprocess --compute_covariance --bfile $BFILE --cfile $CFILE 
    # Fitting the null model and assigning the markers to windows
    mtSet_preprocess --precompute_windows --fit_null --bfile $BFILE --cfile $CFILE --pfile $PFILE --wfile $WFILE --nfile $NFILE --window_size $WSIZE --plot_windows
    
  6. Analysing true genotypes

    mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100
    
  7. Analysing permuted genotypes

    for i in `seq 0 10`; do
     mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100 --perm $i
    done
    
  8. Postprocess

    mtSet_postprocess --resdir $RESDIR --outfile $OUTFILE --manhattan_plot
    

Processing

Before getting started, we have to compute the sample-to-sample genetic covariance matrix, assign the markers to windows and estimate the trait-to-trait covariance matrices on the null model.

Computing the Covariance Matrix

The covariance matrix can be pre-computed as follows:

mtSet_preprocess --compute_covariance --plink_path plink_path  --bfile bfile  --cfile cfile

where

  • plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python covariance reader is employed. We strongly recommend using the plink reader for large datasets.
  • bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
  • cfile is the base name of the output file. The relatedness matrix will be written to cfile.cov while the identifiers of the individuals are written to the file cfile.cov.id. The eigenvalue decomposition of the matrix is saved in the files cfile.cov.eval (eigenvalues) and cfile.cov.evec (eigenvectors). If cfile is not specified, the files will be exported to the current directory with the following filenames bfile.cov, bfile.cov.id, bfile.cov.eval, bfile.cov.evec.

Precomputing the Principal Components

The principal components can be pre-computed as follows:

mtSet_preprocess --compute_PCs k --plink_path plink_path --ffile ffile  --bfile bfile

where

  • k is the number of top principal components that are saved
  • plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python genotype reader is employed. We strongly recommend using the plink reader for large datasets.
  • ffile is the name of the fixed effects file, to which the principal components are written to.
  • bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).

Fitting the null model

To efficiently apply mtSet, it is neccessary to compute the null model beforehand. This can be done with the following command:

mtSet_preprocess --fit_null --bfile bfile --cfile cfile --nfile nfile --pfile pfile --ffile ffile --trait_idx trait_idx

where

  • bfile is the base name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).
  • cfile is the base name of the covariance file and its eigen decomposition (cfile.cov, cfile.cov.eval and cfile.cov.evec). If cfile is not set, the relatedness component is omitted from the model.
  • nfile is the base name of the output file. The estimated parameters are saved in nfile.p0, the negative log likelihood ratio in nfile.nll0, the trait-to-trait genetic covariance matrix in nfile.cg0 and the trait-to-trait residual covariance matrix in nfile.cn0.
  • pfile is the base name of the phenotype file.
  • ffile is the name of the file containing the covariates. Each covariate is saved in one column
  • trait_idx can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance, --trait_idx 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero).

Notice that phenotypes are standardized prior to model fitting.

Precomputing the windows

For applying our set test, the markers have to be assigned to windows. We provide a method that splits the genome in windows of fixed sizes:

mtSet_preprocess --precompute_windows --bfile bfile --wfile wfile --window_size window_size --plot_windows

where

  • bfile is the base name of of the binary bed file (bfile.bim is required).
  • window_size is the size of the window (in basepairs). The default value is 30kb.
  • wfile is the base name of the output file. If not specified, the file is saved as bfile.window_size.wnd in the current folder. Each window is stored in one line having the following format: index, chromosome, start position, stop position, index of startposition and number of SNPs.
  • plot_windows if the flag is set, a histogram over the number of markers within a window is generated and saved as wfile.pdf.

Merging the preprocessing steps

Here, we provided the commands to execute the three preprocessing operations individually. However, it is also possible to combine all steps in a single command:

mtSet_preprocess --compute_covariance --fit_null --precompute_windows ...

Phenotype simulation

Our software package also includes a command-line simulator that allows to generate phenotypes with a wide range of different genetic architectures. In brief, the simulator assumes a linear-additive model, considering the contribution of a randomly selected (causal) genetic region for the set component, polygenic background effects from all remaining genome-wide variants, a contribution from unmeasured factors and iid observation noise. For a detailed description of the simulation procedure, we refer to the Supplementary Methods.

The simulator requires as input the genotypes and the relatedness component:

mtSet_simPheno --bfile bfile --cfile cfile --pfile pfile

where

  • bfile is the name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
  • cfile is the name of the covariance matrix file (cfile.cov, cfile.cov.id are required). If none is specified, the covariance matrix is expected to be in the current folder, having the same filename as the bed file.
  • pfile is the name of the output file (pfile.phe, pfile.region). The file pfile.phe contains the phenotypic values (each sample is saved in one row, each trait in one column). The file pfile.region contains the randomly selected causal region (chromsom, start position, end position). If pfile is not specified, the files are saved in the current folder having an automatic generated filename containing the bed filename and the values of all simulation parameters.

By changing the following parameters different genetic architectures can be simulated and, in particular, the simulation experiments of our paper can be reproduced.

Option Default Datatype Explanation
--seed 0 int seed for random number generator
--nTraits 4 int number of simulated phenotypes
--windowSize 1.5e4 int size of causal region
--vTotR 0.05 float variance explained by the causal region
--nCausalR 10 int number of causal variants in the region
--pCommonR 0.8 float percentage of shared causal variants
--vTotBg 0.4 float variance explained by the polygenic background effects
--pHidden 0.6 float residual variance explained by hidden confounders (in %)
--pCommon 0.8 float background and residual signal that is shared across traits (in %)
--chrom None int specifies the chromosome of the causal region
--minPos None int specifies the min. chromosomal position of the causal region (in basepairs)
--maxPos None int specifies the max. chromosomal position of the causal region (in basepairs)

Running analysis

Once the preprocessing step has been used to obtain the genetic relatedness matrix, to fit the null model and to identify the genetic regions to be considered in the analysis, the set test can be run by the following analysis script:

mtSet_analyze --bfile bfile --cfile cfile --pfile pfile --nfile nfile --wfile wfile --ffile ffile --minSnps minSnps --start_wnd start_wnd --end_wnd end_wnd --resdir rdir --trait_idx traitIdx

where

  • bfile is the base name of of the binary bed file (bfile.bed, bfile.bim, bfile.fam are required).
  • cfile is the base name of the covariance matrix file. The script requires the files: cfile.cov containing the the genetic relatedness matrix, cfile.cov.id containing the corresponding sample identifiers, cfile.cov.eval and cfile.cov.evec containing the eigenvalues and eigenvectors of the matrix. If cfile is not set, the relatedness component is omitted from the model.
  • pfile is the base name of the phenotype file. The script requires the file pfile.phe containing the phenotype data.
  • nfile is the base name of the null model file. The script requires the file nfile.p0 containing the optimal null model parameters. If covariates are set, it also requires the file nfile.f0.
  • wfile is the base name of the file containing the windows to be considered in the set test. The script requires the file wfile.wnd.
  • ffile is the name of the file containing the covariates. Each covariate is saved in one column.
  • perm is the seed used when permuting the genotypes. If the option is not specified then no permutation is considered.
  • start_wnd is the index of the start window
  • end_wnd is the index of the end window
  • minSnps if set only windows containing at least minSnps are considered in the analysis rdir is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named - start_wnd_endwnd.res and contains results in the following format: window index, chromosome, start position, stop position, index of start position, number of SNPs and log likelihood ratio.
  • rdir is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named start_wnd_end_wnd.res and contains results in the following format: window index, chromosome, start position, stop position, index of startposition, number of SNPs and log likelihood ratio.
  • trait_idx can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance trait_idx 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero). Notice that phenotypes are standardized prior to model fitting.

Postprocessing

After running mtSet, the following script can be used to merge the result files and estimate the p-values (p-values are obtained by a parametric fit of the test statistics):

mtSet_postprocess --resdir resdir --outfile outfile --manhattan_plot

where

  • resdir is a pointer to the folder containing the result files of the analysis.
  • outfile is the prefix of the two output files. outfile.perm lists the test statistics (first column) and p-values (second column) of the permutated windows outfile.test contains the (index, chromosome, start position, stop position, SNP index, number of SNPs, test statistics and p-value) of each window. Each window is saved in one row.
  • manhattan_plot is a flag. If set, a manhattan plot is saved in outfile.manhattan.jpg (default: False).

Within Python

This part of the tutorial shows how to use mtSet within python. For a tutorial on how to use mtSet from the command line using the limix scripts (mtSet_preprocess, mtSet_analyze, mtSet_postprocess, mtSet_simPheno) please refer to sections Processing, Phenotype simulation, Running analysis, and Postprocessing.

Setting up

In [1]:
# activiate inline plotting
%matplotlib inline

from download_examples import get_1000G_mtSet
import scipy as sp
import scipy.linalg
import limix
In [2]:
# loading 1000G genotypes for mtSet demo
get_1000G_mtSet()
In [3]:
# base name for bed, bim and fam
bfile = './data/1000g/chrom22_subsample20_maf0.10'

Split genotypes into regions

In [4]:
from limix.mtSet.core import plink_reader
In [5]:
# import genotype positions
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
chrom = bim[:, 0].astype(float)
pos = bim[:, -1].astype(float)
In [6]:
# uses splitter to split the genotypes
from limix.mtSet.core.splitter import Splitter
split = Splitter(pos=pos,chrom=chrom)

The method splitGeno allows to define the regions that will then considered for the analysis with mtSet. Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).

Argument Default Datatype Explanation
method 'slidingWindow' str Uses a sliding window approach to define regions (a region-based approach will be availabe soon)
size 5E+04 (50kb) float Window size. Pace is set at half the size of the window
minSnps 1 int Windows with number of SNPs lower that this threshold are not considered
maxSnps sp.inf int Windows with number of SNPs higher that this threshold are not considered
cache False bool If true, it activates the caching
out_dir './cache' str outdir of the cache file
fname None str Name of the file
rewrite False bool If true and the cache file already exists, the cache file is overwritten
In [7]:
split.splitGeno(cache=True, fname='regions.h5', minSnps=4)
print '%d windows' % split.nWindows
1380 windows

Apply

In this section we showcase how to construct the mtSet class that will then be used for the set test analysis. We showcase both the full mtSet that models relatedness as random effect by the means of an individual-to-individual covariance matrix and the approximated model (mtSetPC) that models relatedness as fixed effect using principal component from the covariance.

In [8]:
# import phenotype and sample relatedness
pheno_file = './data/1000g/pheno.phe'
sample_relatedness_file = './data/1000g/chrom22.cov'
Y = sp.loadtxt(pheno_file)
R = sp.loadtxt(sample_relatedness_file)
In [9]:
# compute eigenvalues and eigenvectors of the sample relatedness matrix
S_R, U_R = scipy.linalg.eigh(R) # these are needed for the full mtSet model
In [10]:
# caculate fixed effects with rel
F = U_R[:, ::-1][:, :10] # it considered the first 10 PCs
F = sp.concatenate([F, sp.ones((F.shape[0], 1))], 1) # add an intercept term
In [11]:
if 0:
    # use full mtSet implementation
    # (relatedness is modeled as random effect by the means of
    # an individual-to-individual covariance matrix)
    mtSet = limix.MTSet(Y, S_R=S_R, U_R=U_R)
else:
    # use mtSetPC
    # (relatedness s modelled as fixed effect
    # using principal component from the covariance)
    mtSet = limix.MTSet(Y, F=F)

Null model

If the analysis is parallelized across different set of regions and permutations, it might be convenient to cache the results from the optimization of the null model (as the null model need to be optimized only once).

Argument Default Datatype Explanation
cache False bool If true, it activates the caching
out_dir './cache' str outdir of the cache file
fname None str Name of the file
rewrite False bool If true and the cache file already exists, the cache file is overwritten
In [12]:
RV = mtSet.fitNull()

The returned dictionary contains:

  • B: value of the optimized effect sizes
  • Cg: value of the genetic trait-to-trait covariance
  • Cn: value of the residual trait-to-trait covariance
  • conv: bool that indicates convergence of the optimization
  • time: time elpased for optimizing the parameters
  • NLL0: negative log likelihood of the null model
  • LMLgrad: norm of the gradient of the negative log likelihood dividived by the number of parameters

Test

In [13]:
# read fam
bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))
fam = plink_reader.readFAM(bfile,usecols=(0,1))
In [14]:
n_wnds = 100 # only hundred windows are considered
LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions
for wnd_i in range(n_wnds):
    wnd_pos = split.wnd_pos[wnd_i]
    nSnps = split.nSnps[wnd_i]
    idx_wnd_start = split.idx_wnd_start[wnd_i]
    print('.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps))
    
    Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']

    # multi trait set test fit
    RV = mtSet.optimize(Xr)
    LLR[wnd_i] = RV['LLR'][0]
.. window 0 - (22, 16025000-16075000) - 21 snps
.. window 1 - (22, 16050000-16100000) - 23 snps
.. window 2 - (22, 16125000-16175000) - 7 snps
.. window 3 - (22, 16225000-16275000) - 9 snps
.. window 4 - (22, 16250000-16300000) - 16 snps
.. window 5 - (22, 16275000-16325000) - 12 snps
.. window 6 - (22, 16325000-16375000) - 5 snps
.. window 7 - (22, 16350000-16400000) - 5 snps
.. window 8 - (22, 16475000-16525000) - 8 snps
.. window 9 - (22, 16500000-16550000) - 7 snps
.. window 10 - (22, 16550000-16600000) - 5 snps
.. window 11 - (22, 16600000-16650000) - 13 snps
.. window 12 - (22, 16625000-16675000) - 18 snps
.. window 13 - (22, 16650000-16700000) - 10 snps
.. window 14 - (22, 16825000-16875000) - 46 snps
.. window 15 - (22, 16850000-16900000) - 75 snps
.. window 16 - (22, 16875000-16925000) - 68 snps
.. window 17 - (22, 16900000-16950000) - 62 snps
.. window 18 - (22, 16925000-16975000) - 43 snps
.. window 19 - (22, 16950000-17000000) - 22 snps
.. window 20 - (22, 16975000-17025000) - 34 snps
.. window 21 - (22, 17000000-17050000) - 73 snps
.. window 22 - (22, 17025000-17075000) - 80 snps
.. window 23 - (22, 17050000-17100000) - 71 snps
.. window 24 - (22, 17075000-17125000) - 61 snps
.. window 25 - (22, 17100000-17150000) - 64 snps
.. window 26 - (22, 17125000-17175000) - 42 snps
.. window 27 - (22, 17150000-17200000) - 9 snps
.. window 28 - (22, 17175000-17225000) - 24 snps
.. window 29 - (22, 17200000-17250000) - 39 snps
.. window 30 - (22, 17225000-17275000) - 26 snps
.. window 31 - (22, 17250000-17300000) - 55 snps
.. window 32 - (22, 17275000-17325000) - 109 snps
.. window 33 - (22, 17300000-17350000) - 115 snps
.. window 34 - (22, 17325000-17375000) - 92 snps
.. window 35 - (22, 17350000-17400000) - 77 snps
.. window 36 - (22, 17375000-17425000) - 92 snps
.. window 37 - (22, 17400000-17450000) - 97 snps
.. window 38 - (22, 17425000-17475000) - 68 snps
.. window 39 - (22, 17450000-17500000) - 54 snps
.. window 40 - (22, 17475000-17525000) - 61 snps
.. window 41 - (22, 17500000-17550000) - 83 snps
.. window 42 - (22, 17525000-17575000) - 81 snps
.. window 43 - (22, 17550000-17600000) - 72 snps
.. window 44 - (22, 17575000-17625000) - 88 snps
.. window 45 - (22, 17600000-17650000) - 99 snps
.. window 46 - (22, 17625000-17675000) - 158 snps
.. window 47 - (22, 17650000-17700000) - 156 snps
.. window 48 - (22, 17675000-17725000) - 96 snps
.. window 49 - (22, 17700000-17750000) - 89 snps
.. window 50 - (22, 17725000-17775000) - 65 snps
.. window 51 - (22, 17750000-17800000) - 77 snps
.. window 52 - (22, 17775000-17825000) - 111 snps
.. window 53 - (22, 17800000-17850000) - 102 snps
.. window 54 - (22, 17825000-17875000) - 73 snps
.. window 55 - (22, 17850000-17900000) - 69 snps
.. window 56 - (22, 17875000-17925000) - 129 snps
.. window 57 - (22, 17900000-17950000) - 156 snps
.. window 58 - (22, 17925000-17975000) - 120 snps
.. window 59 - (22, 17950000-18000000) - 91 snps
.. window 60 - (22, 17975000-18025000) - 93 snps
.. window 61 - (22, 18000000-18050000) - 88 snps
.. window 62 - (22, 18025000-18075000) - 68 snps
.. window 63 - (22, 18050000-18100000) - 112 snps
.. window 64 - (22, 18075000-18125000) - 113 snps
.. window 65 - (22, 18100000-18150000) - 73 snps
.. window 66 - (22, 18125000-18175000) - 71 snps
.. window 67 - (22, 18150000-18200000) - 58 snps
.. window 68 - (22, 18175000-18225000) - 74 snps
.. window 69 - (22, 18200000-18250000) - 75 snps
.. window 70 - (22, 18225000-18275000) - 56 snps
.. window 71 - (22, 18250000-18300000) - 77 snps
.. window 72 - (22, 18275000-18325000) - 103 snps
.. window 73 - (22, 18300000-18350000) - 88 snps
.. window 74 - (22, 18325000-18375000) - 67 snps
.. window 75 - (22, 18350000-18400000) - 63 snps
.. window 76 - (22, 18375000-18425000) - 65 snps
.. window 77 - (22, 18400000-18450000) - 116 snps
.. window 78 - (22, 18425000-18475000) - 153 snps
.. window 79 - (22, 18450000-18500000) - 124 snps
.. window 80 - (22, 18475000-18525000) - 125 snps
.. window 81 - (22, 18500000-18550000) - 111 snps
.. window 82 - (22, 18525000-18575000) - 65 snps
.. window 83 - (22, 18550000-18600000) - 50 snps
.. window 84 - (22, 18575000-18625000) - 45 snps
.. window 85 - (22, 18600000-18650000) - 80 snps
.. window 86 - (22, 18625000-18675000) - 74 snps
.. window 87 - (22, 18650000-18700000) - 17 snps
.. window 88 - (22, 18675000-18725000) - 11 snps
.. window 89 - (22, 18700000-18750000) - 19 snps
.. window 90 - (22, 18725000-18775000) - 14 snps
.. window 91 - (22, 18750000-18800000) - 11 snps
.. window 92 - (22, 18775000-18825000) - 8 snps
.. window 93 - (22, 18800000-18850000) - 11 snps
.. window 94 - (22, 18825000-18875000) - 14 snps
.. window 95 - (22, 18850000-18900000) - 42 snps
.. window 96 - (22, 18875000-18925000) - 102 snps
.. window 97 - (22, 18900000-18950000) - 101 snps
.. window 98 - (22, 18925000-18975000) - 93 snps
.. window 99 - (22, 18950000-19000000) - 78 snps

The returned dictionary from mtSet.optimize contains:

  • Cr: value of the region-term trait-to-trait covariance
  • Cg: value of the genetic trait-to-trait covariance
  • Cn: value of the residual trait-to-trait covariance
  • variances: n_traits-by-3 matrix of variance explained by the three contributions (region, background, noise) for the traits
  • conv: bool that indicates convergence of the optimization
  • time: time elpased for optimizing the parameters
  • NLLAlt: negative log likelihood of the alternative model
  • LLR: test statistics
  • LMLgrad: norm of the gradient of the negative log likelihood dividived by the number of parameters

P-values

P values are obtained from a relatively small number of genome-wide permutations, fitting a parametric model to the null distribution. Here we showcase the permutation procedure by considering 10 permutations for the 10 regions analyzed.

In [15]:
n_perms = 10
LLR_null = [] # in this list test statistics from permutations will be stored
for perm_i in range(n_perms):
    
    #1. generate permutation
    print 'permutation %d' % perm_i
    sp.random.seed(perm_i)
    perm_idxs = sp.random.permutation(Y.shape[0])
    
    #2. scan on the 100 regions
    for wnd_i in range(n_wnds):
        wnd_pos = split.wnd_pos[wnd_i]
        nSnps = split.nSnps[wnd_i]
        idx_wnd_start = split.idx_wnd_start[wnd_i]
        Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']
        Xr = Xr[perm_idxs, :] # permute samples in region term
        RV = mtSet.optimize(Xr)
        LLR_null.append(RV['LLR'][0])
LLR_null = sp.array(LLR_null)
permutation 0
permutation 1
permutation 2
permutation 3
permutation 4
permutation 5
permutation 6
permutation 7
permutation 8
permutation 9

The parametric fit to the ditribution of the test statistics under the null and the consequent conversion of the observed test statistics in P values is performed by the module limix.stats.chi2mixture as shown below.

In [16]:
from limix.stats.chi2mixture import Chi2mixture
c2m = Chi2mixture(tol=4e-3)
c2m.estimate_chi2mixture(LLR_null)
pv = c2m.sf(LLR)
In [17]:
#makes a manhattan plot
wnd_chrom = split.wnd_pos[:n_wnds,0]
wnd_start = split.wnd_pos[:n_wnds,1]
wnd_end = split.wnd_pos[:n_wnds,2]
import pylab as pl
pl.plot(wnd_start, -sp.log10(pv), 'o', color='MidnightBlue')
Out[17]:
[<matplotlib.lines.Line2D at 0x115fe4fd0>]

Developers

References

[1] Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012).

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')