# first set up inline plotting
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
The problem occurs when you permform several statistical tests: by chance, some will be significant even if the data has been generated under the null hypothesis.
Imaging you are doing $n$ t tests each one with a risk of type one error of $\alpha$, and we denote by $t_\alpha$ the t-value such that for a variable t drawn from a Student distribution, we have $P(t < t_\alpha) = \alpha$, the probability of getting at least one significant test is these n tests is :
# code demo: a t-distribution
from scipy.stats import t as tdist
df = 14
n = 100
alph = .05
# Often easier to compute the static distribution of t with df : t14 = tdist(df)
# rvs : draw one or several random variables
ts = tdist(df).rvs(size=n)
# The threshold as a t value:
t_alph = tdist.isf(alph,df)
print "Nubmer of test significant at the %f level: %d " %(alph, sum(ts >= t_alph))
Nubmer of test significant at the 0.050000 level: 5
To correct for the number of tests with the Bonferroni correction :
# if we have p values :
p = tdist(df).sf(ts)
p_corr = (1 - (1 - p)**n)
print "Number of test significant with uncorrected threshold: %d, corrected: %d ", \
sum(p <= alph), sum(p_corr <= alph)
# This can be put in a little function:
def p_corrected(ps):
return( 1 - (1 - p)**n )
# alternatively : compare orginal p with : 1.0 - exp(1./n * log(1 - .05))
def alph_corr(alph,n):
return 1.0 - exp(1./n * log(1 - alph))
print "Uncorrected threshold: %f, corrected threshold: %f" % (alph, alph_corr(alph,n))
Number of test significant with uncorrected threshold: %d, corrected: %d 5 0 Uncorrected threshold: 0.050000, corrected threshold: 0.000513
On average we would have to run about 20 times the previous code to get a (spurious) significant effect.
Note that when the tests are NOT INDEPENDENT, then this procedure is conservative (on average you will get less false positive than the specified risk or error), and therefore not the most sensitive.
The false detection rate (introduced by Benjamini and Hochberg (1995), see [http://en.wikipedia.org/wiki/False_discovery_rate] for an history of this measure.
Let's denote :
We want to control such that $Q = \frac{V}{S+V} = \frac{V}{R}$ is small.
The false discovery rate is the expectation of the false discovery proportion $\textrm{FDR} = E(Q)$.
The procedure : step by step.
First, we order the p-values by increasing order. We start to test the smallest p-value (the highest t or z), and declare the test significant if its p-value is less than $\alpha/n$, for instance $0.05/n$. If this test is significant, we test the following pOur new threshold to declare significance is $\alpha * 2 /n)$. This is iterated, until a test is not significant.
An intuition : $i \frac{\alpha}{n} $ is the expected value if I draw $i$ times in a bernouilli distribution of probability $ \frac{\alpha}{n} $
Let's implement this and compare to Bonferroni or uncorrected thresolds:
import scipy.stats as st
norm01 = st.norm(0,1)
# adapted from nipy/algorithms/statistics/empirical_pvalue
#q = ( (range(n)+1)/n) * 0.05 )
def fdr(p_values=None, verbose=0):
"""Returns the FDR associated with each p value
Parameters
-----------
p_values : ndarray of shape (n)
The samples p-value
Returns
-------
q : array of shape(n)
The corresponding fdr values
"""
#p_values = check_p_values(p_values)
n_samples = p_values.size
order = p_values.argsort()
sp_values = p_values[order]
# compute q while in ascending order
q = np.minimum(1, n_samples * sp_values / np.arange(1, n_samples + 1))
for i in range(n_samples - 1, 0, - 1):
q[i - 1] = min(q[i], q[i - 1])
# reorder the results
inverse_order = np.arange(n_samples)
inverse_order[order] = np.arange(n_samples)
q = q[inverse_order]
if verbose:
import matplotlib.pylab as mp
mp.figure()
mp.xlabel('Input p-value')
mp.plot(p_values, q, '.')
mp.ylabel('Associated fdr')
return q
def fdr_threshold(p_values, alpha=0.05):
"""Return FDR threshold given p values
Parameters
-----------
p_values : array of shape (n), optional
The samples p-value
alpha : float, optional
The desired FDR significance
Returns
-------
critical_p_value: float
The p value corresponding to the FDR alpha
"""
n_samples = np.size(p_values)
sp_values = np.sort(p_values)
p_corr = alpha / n_samples
i = np.arange(1, n_samples + 1) # i from 1 to n_sample
critical_set = sp_values[sp_values < p_corr * i]
if len(critical_set) > 0:
critical_p_value = critical_set.max()
else:
critical_p_value = p_corr
return critical_p_value
n = 1000
unif = st.uniform()
p = unif.rvs(size=n)
q = []
alph = arange(0.01,0.1,0.01)
for a in alph:
q.append(fdr_threshold(p,a))
#more elegant:
q = [fdr_threshold(p,a) for a in alph]
print size(alph), size(q)
plot(alph, q,'--')
9 9
[<matplotlib.lines.Line2D at 0x3afdbd0>]
n = 1000
z = norm01.rvs(size=n)
p = norm01.sf(z)
print fdr_threshold(p)
print sum(p < fdr_threshold(p))
q = fdr(p, verbose=1)
if len(z) <= 20:
order = z.argsort()
print "z \n", z
print "p \n", p
print "q \n", q
print "sorted z \n", z[order]
print "sorted fdr \n", q[order]
5e-05 0
alph = 0.05
df = 14
t14 = tdist(df)
n = 200
ts = t14.rvs(size=n)
# ts = np.sort(ts) # careful of ts = ts.sort() .... you get None !!!
ps = t14.sf(ts)
#uncorrected :
print 'Uncorrected: ',sum(ps < alph)
#Bonferroni :
print 'Bonferroni: ',sum(ps < alph_corr(alph,n))
#FDR : (suppose z)
fdr_alph = fdr_threshold(ps)
print 'FDR: ',sum(ps < fdr_alph)
Uncorrected: 9 Bonferroni: 0 FDR: 0
# Adding signal :
z = st.norm.rvs(0,1, size=n)
print sum(z>1.5)
z = z[z>1.5]+1
alph, df, n = 0.05, 14, 100
t14 = tdist(df)
ts = t14.rvs(size=n)
# add some signal :
print "number of true positive: ", sum(ts > 1.5)
ts[ts > 1.5] += 1.5
ps = t14.sf(ts)
#uncorrected :
print 'Uncorrected: ',alph, sum(ps < alph)
#Bonferroni :
print 'Bonferroni: ', alph_corr(alph,n), sum(ps < alph_corr(alph,n))
#FDR :
fdr_alph = fdr_threshold(ps)
q = fdr(ps, verbose=1)
print 'FDR: ',fdr_alph, sum(ps < fdr_alph), sum(q <= alph)
13 number of true positive: 16 Uncorrected: 0.05 16 Bonferroni: 0.000512801416262 0 FDR: 0.00463758728239 15 16
The principle of the permutation is simple: we construct a distribution of the statistics we are interested in under the assumption that there is no signal. For instance, let's take the one sample t-test scenario.
#N subjects, z[i] = effects observed for subject i.
N = 15
mu, sigma = 0.75, 1.0
normH1 = st.norm(mu, sigma)
z = normH1.rvs(size=N) # a series of z values under H1
print z
[ 0.44980773 0.3049634 0.23894538 0.95284122 0.85829427 2.60215513 1.47310698 0.4147775 -0.09161498 0.37855162 1.17150395 0.69070465 3.3456306 0.89502638 0.68098054]
The idea here is the following: if the z are drawn from a normal distribution of mean zero, then there there is as much chance to get a negative value than a positive values (the distribution is symmetric). We can therefore arbitrarily swap the sign of our values and construct the distribution of their mean under this hypothesis.
psign = np.random.randint(2, size=len(z))
psign[psign==0] = -1
print psign
print z * psign
[ 1 -1 1 -1 -1 1 1 1 -1 1 1 1 1 1 1] [ 0.44980773 -0.3049634 0.23894538 -0.95284122 -0.85829427 2.60215513 1.47310698 0.4147775 0.09161498 0.37855162 1.17150395 0.69070465 3.3456306 0.89502638 0.68098054]
def distrib_under_H0(x, stat, nperm = 1000):
dist = []
for i in xrange(nperm):
psign = np.random.randint(2, size=len(x))
psign[psign==0] = -1
dist.append(stat(x * psign))
return dist
def my_stat(x):
return(mean(x))
d_H0 = distrib_under_H0(z, my_stat)
size(d_H0)
hist(d_H0)
mean(z)
empirical_p = 1.0 * sum(d_H0 > mean(z))/size(d_H0)
print my_stat(z), empirical_p
0.957711624634 0.0
In this instance, we have a significant p value at the 5 percent risk of error because under the null hypothesis, the actual data statistics is observed less than 5 percent of the times. This framework is
We simply take the distribution of the maximum statistics across tests !
An example ?