Notebook

Common Statistical Tools¶

Author: Yannick Copin y.copin@ipnl.in2p3.fr $ \renewcommand{\d}{\mathrm{d}} \renewcommand{\v}[1]{\boldsymbol{#1}} % Vector \newcommand{\m}[1]{\mathsf{#1}} % Matrix \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\med}{med} $

Abstract: A restricted number of statistical tools are presented, with emphasis on details sometimes overlooked in literature. This may be too simple, or too technical, or whatever: comments and suggestions are welcome.

In [1]:

%matplotlib inline
import numpy as N
import scipy.stats as SS
import scipy.linalg as SL
import matplotlib.pyplot as P

/data/ycopin/Softs/local/lib/python2.6/site-packages/numpy/oldnumeric/__init__.py:11: ModuleDeprecationWarning: The oldnumeric module will be dropped in Numpy 1.9
  warnings.warn(_msg, ModuleDeprecationWarning)

Sample statistics¶

We define two datasets:

Case without intrinsic dispersion: $\v{x} = (x_1,\ldots, x_N)$ is a sample of $N$ measurements of the same null quantity, with (usually estimated) gaussian measurement uncertainties $\d x_{i}^2$ following a log-normal distribution (here, $\log\d x^2$ follows a $\mathcal{N}(0,1)$ distribution).

Case with intrinsic dispersion: $\v{y} = (y_1,\ldots, y_N)$ is an intrinsically dispersed $\mathcal{N}(\mu=0,\sigma^2)$ sample, measured with the same gaussian measurement uncertainties $\d x_{i}$ as above. Unfortunately, the intrinsic dispersion $\sigma$ is usually not known a priori.

In [2]:

n = 100

N.random.seed(12345)
dx = N.sqrt(N.random.lognormal(sigma=0.3, size=n)) # Log-normally distributed uncertainties
x = N.random.normal(loc=0, scale=dx) # Pure gaussian errors
y = x + N.random.randn(n)            # Gaussian uncertainties + intrinsic dispersion

P.errorbar(N.arange(n), x, yerr=dx, fmt='bo', label='x: no intrinsic dispersion')
P.errorbar(N.arange(n), y, yerr=dx, fmt='rs', label='y: x + intrinsic dispersion')
#P.hist(x, orientation='horizontal', histtype='stepfilled', color='b', alpha=0.5)
#P.hist(y, orientation='horizontal', histtype='step', color='r')
P.legend(loc='best')

Out[2]:

<matplotlib.legend.Legend at 0xb75b2e6c>

Maximum-likelihood estimators¶

Sample location¶

Arithmetic average: $$\begin{align} \hat{\mu} &= \frac{1}{N}\sum_i x_i \\ \Var(\hat{\mu}) &= \frac{\hat{\sigma}^{2}}{N} \end{align}$$

In [3]:

mu = N.mean(x)  # Mean
dmu = N.std(x, ddof=1) / N.sqrt(n)  # Error on mean
print "Arithmetic mean: µ = %.4f ± %.4f" % (mu, dmu)

Arithmetic mean: µ = -0.0391 ± 0.0970

Weighted ("optimal") average: $$\begin{align} \hat{\mu}_w &= \frac{\sum_i w_i x_i}{\sum_i w_i} \quad\text{with}\quad w_i = \frac{1}{\d x_{i}^{2} + \sigma^2} \\ \Var(\hat{\mu}_w) &= \frac{1}{\sum_i w_i} \end{align}$$

In [4]:

muw, sumw = N.average(x, weights=dx ** -2, returned=True)  # Optimally weighted mean
dmuw = N.sqrt(1 / sumw)                                    # Uncertainty on weighted mean
print "x weighted mean:               µ = %.4f ± %.4f" % (muw, dmuw)
print "y weighted mean (wrong disp.): µ = %.4f ± %.4f" % (N.average(y, weights=dx ** -2), dmuw)
muyw, sumwy = N.average(y, weights=1 / (1 + dx ** 2), returned=True)
dmuyw = N.sqrt(1 / sumwy)
print "y weighted mean (true disp.):  µ = %.4f ± %.4f" % (muyw, dmuyw)

x weighted mean:               µ = -0.0258 ± 0.0981
y weighted mean (wrong disp.): µ = -0.2177 ± 0.0981
y weighted mean (true disp.):  µ = -0.2168 ± 0.1418

Attention:

In the optimally weighted mean, $\sigma_i^2 = 1/w_{i}$ is supposed to be the total variance of normally-distributed $x_i$, therefore including both measurement error $\d x_i$ and intrinsic distribution dispersion $\sigma$ if any: $\sigma_i^2 = \d x_i^2 + \sigma^2$.
If the weighting is not the true inverse variance $1/\sigma_i^2$, the weighted average is optimal (with minimal variance) but biased.
The reduced $\chi^2$ around estimated mean, defined as $\chi_\nu^2 = 1/(N-1) \times \sum_i (x_i - \mu_w)^2 / \sigma_i^2$, should not be significantly different from one for $\sigma_i$ compatible to the observed dispersion. If this is not the case, $\sigma_i$ has probably been misestimated, either because it ignored the intrinsic dispersion $\sigma$ or because the measurement uncertainties $\d x_i$ are misestimated, or both.

In [5]:

print "DoF k=%d" % (n - 1)
chi2 = N.sum(((x - muw) / dx) ** 2)
p = SS.chisqprob(chi2, n - 1)
print "x (no disp.):    χ² = %.2f, p-value=%.3f" % (chi2, p)
chi2 = N.sum(((y - N.average(y, weights=dx ** -2))/dx) ** 2)  # Wrong dispersion: sigma_i**2 = dx**2
p = SS.chisqprob(chi2, n - 1)
print "y (wrong disp.): χ² = %.2f, p-value=%.3f" % (chi2, p)
chi2 = N.sum(((y - muyw) / N.hypot(1, dx)) ** 2)              # Correct dispersion: sigma_i**2 = sigma**2 + dx**2
p = SS.chisqprob(chi2, n - 1)
print "y (true disp.):  χ² = %.2f, p-value=%.3f" % (chi2, p)

DoF k=99
x (no disp.):    χ² = 93.38, p-value=0.640
y (wrong disp.): χ² = 185.51, p-value=0.000
y (true disp.):  χ² = 87.61, p-value=0.787

Sample scale¶

Unweighted: The supposedly unbiased sample variance wrt. unknown mean $\mu$ (but see discussion in Oliphant, 2006) is: $$\begin{align} \hat{\sigma}^2 &= \frac{1}{N-1} \sum_i (x_i - \hat{\mu})^2 \\ \Var(\hat{\sigma}^2) &= \frac{1}{N}\left(m^4 - \frac{N-3}{N-1}\sigma^4\right) \\ &\simeq \frac{1}{N} \times 2\hat{\sigma}^4 \quad\text{in the normal approximation} \\ \text{therefore}\quad \Var(\hat{\sigma}) &\simeq \frac{1}{2N}\hat{\sigma}^2 \\ \text{Ahn & Fessler (2003):}\quad \Var(\hat{\sigma}) &= \frac{1}{2(N-1)}\hat{\sigma}^2 \quad\text{pour}\quad n > 10. \end{align}$$

In [6]:

sig = N.std(x, ddof=1)
dsig = sig / N.sqrt(2 * (n-1))  # Normal approximation, n>10
print "Dispersion: σ = %.4f ± %.4f" % (sig, dsig)

Dispersion: σ = 0.9704 ± 0.0690

Weighted: $$ \hat{\sigma}_w^2 = \frac{\sum w_i}{\left(\sum w_i\right)^2 - \sum w_i^2} \times \sum_i w_i (x_i - \hat{\mu}_w)^2. $$

In [7]:

w = dx ** -2
sumw = N.sum(w)
sigw = N.sqrt( N.sum(w * (x - muw) ** 2) / sumw )
print "x Weighted dispersion (biased):                σ = %.4f" % (sigw)
sigw = N.sqrt( N.sum(w * (x - muw) ** 2) * sumw / (sumw ** 2 - N.dot(w, w)) )
print "x Weighted dispersion (unbiased):              σ = %.4f" % (sigw)
sigyw = N.sqrt( N.sum(w * (y - N.average(y, weights=w)) ** 2) * sumw / (sumw ** 2 - N.dot(w, w)) )
print "y Weighted dispersion (wrong disp., unbiased): σ = %.4f" % (sigyw)
wy = 1 / (1 + dx ** 2)
sigyw = N.sqrt( N.sum(wy * (y - muyw) ** 2) * sumwy / (sumwy ** 2 - N.dot(wy, wy)) )
print "y Weighted dispersion (true disp., unbiased):  σ = %.4f" % (sigyw)

x Weighted dispersion (biased):                σ = 0.9481
x Weighted dispersion (unbiased):              σ = 0.9534
y Weighted dispersion (wrong disp., unbiased): σ = 1.3438
y Weighted dispersion (true disp., unbiased):  σ = 1.3338

Weighted RMS is usually defined as:

In [8]:

def wrms(x, dx):
    w = 1/dx**2
    muw,sumw = N.average(x, weights=w, returned=True)
    return N.sqrt( N.sum(w*(x-muw)**2)/sumw )

wrmsx = wrms(x,dx)
print "WRMS = %.4f" % wrmsx

WRMS = 0.9481

This strictly corresponds to the biased weighted variance. If using inverse-variance weighting $w_i = 1/\sigma_i^2$, one has $\mathrm{WRMS}^2 = \chi^2 \times \sigma_{\mu_w}^2$. Under the assumption that the intrinsic scatter is null ($\sigma\equiv 0$), the WRMS can be used as a proxy for the rescaled uncertainty on the weighted mean when the errors $\d x_i$ were misestimated by a multiplicative factor: $$ \sigma_{\mu_w}^{*2} = \chi^2_\nu \times \sigma_{\mu_w}^2 = \mathrm{WRMS}^2/(N-1). $$

In [9]:

muw, sumw = N.average(x, weights=(0.1*dx) ** -2, returned=True)           # Errors are wrong by a factor 10
dmuw = N.sqrt(1 / sumw)
print "x Weighted mean (wrong errors):    µ = %.4f ± %.4f" % (muw, dmuw)  # Wrong weighted mean uncertainty
dmur = wrms(x, 0.1 * dx) / N.sqrt(n - 1)
print "x Weighted mean (rescaled errors): µ = %.4f ± %.4f" % (muw, dmur)  # Rescaled weighted mean uncertainty

x Weighted mean (wrong errors):    µ = -0.0258 ± 0.0098
x Weighted mean (rescaled errors): µ = -0.0258 ± 0.0953

Note: the weighted mean $\hat{\mu}_{w}$ is unaffected by a scaling error on the errors $\d x_{i}$.

Bayesian estimates¶

They are studied in Oliphant, 2006, and implemented in scipy.stats.mvsdist (bayesian estimate of the sample mean, variance and standard deviation PDFs) or scipy.stats.bayes_mvs (bayesian confidence intervals).

In [10]:

(mu, (lmu, hmu)), (var, (lvar, hvar)), (sig, (lsig, hsig)) = SS.bayes_mvs(x, alpha=0.683)
print "Bayesian estimates: µ = %.4f +%.4f -%.4f" % (mu, hmu-mu, mu-lmu)
print "Bayesian estimates: σ = %.4f +%.4f -%.4f" % (sig, hsig-sig, sig-lsig)

Bayesian estimates: µ = -0.0391 +0.0976 -0.0976
Bayesian estimates: σ = 0.9779 +0.0696 -0.0697

Robust estimators¶

Median-based estimators are robust (50% of the points can be wrong), but have a low accuracy. More pertinent estimators can be defined.

Location¶

The variance of the median $m$ for large samples and normal distributions is: $$ \Var(m) = \frac{\pi}{2}\frac{s^{2}}{n}, $$ where $s$ is an (presumably robust) estimate of the scale.

Scale¶

Median Absolute Deviation: $$ \mathrm{MAD} = \med(| x - \med(x) |) $$

Inter-Quartile Range: this is the 25%-trimmed range, i.e. the difference between the 75th- and the 25th-percentiles of a sample.

Both estimators have a optimal breakpoint of 0.5: up to half the points of the sample can be arbitrarily large before the estimator gives an arbitrarily large estimate. (By comparison, the sample standard deviation has a breakpoint of 0.)

For a normal distribution, these are related to standard deviation $\sigma$ by: $$\begin{align} \hat{\sigma} &= \frac{1}{\Phi^{-1}(3/4)}\mathrm{MAD} \simeq 1.4826\times\mathrm{MAD} \\ % 1/SS.norm.ppf(0.75) = 1.4826 &= \frac{1}{2\Phi^{-1}(3/4)} \mathrm{IQR} \simeq 0.7413\times\mathrm{IQR} % 2*sqrt(2)*scipy.special.erfinv(0.5) = 2*SS.norm.ppf(0.75) = 1.349 \end{align}$$ where $\Phi$ is the standard normal cumulative distribution function (scipy.stats.norm.cdf), and its inverse $\Phi^{-1}$ is known as the normal quantile function, or probit function (scipy.stats.norm.ppf). Note that MAD is biased for low number of points (see Croux & Rousseeuw, 1992).

Unfortunately, $\mathrm{MAD}$ is only 37% as efficient as the sample standard deviation.

In [11]:

med = N.median(x)
nmad = N.median(N.abs(x - N.median(x))) * 1.4826   # Normalized Median Absolute Deviation (for normal distributions)
dmed = nmad / N.sqrt(2 * n / N.pi)                 # For large normal samples
print "Median: %.4f ± %.4f" % (med, dmed)

Median: -0.2036 ± 0.1304

More robust estimators of scale with maximal breakdown point are presented in Croux and Rousseeuw (1992).

Absolute Pairwise Differences: defined in Rousseeuw and Croux (1993) $$\begin{align} S_n &= 1.1926 \med_i \left( \med_j(\left| x_i - x_j \right|) \right), \\ Q_n &= c_n \text{first quartile of} \left( \left| x_i - x_j \right| : i < j \right), \end{align}$$

Covariance matters¶

$\chi^2$ minimisation¶

Consider $\v{y}$ a column-vector of $N$ measurements $y_i$, each of which assumed to be Gaussian distributed with a mean $F_i(\v{\theta})$ (hereafter the model $\v{F}(\v{\theta})$, depending on $M$ parameters $\theta_k$). The maximum-likelyhood estimate $\hat{\v{\theta}}$ of set of parameters $\v{\theta}$ is the one minimizing the $\chi^2$ defined by: $$ \chi^2(\v{\theta}) = (\v{y} - \v{F}(\v{\theta}))^T \cdot \m{V}^{-1} \cdot (\v{y} - \v{F}(\v{\theta})), $$ where $\m{V}$ is the (supposedly known) covariance matrix between the measurements: $V_{ij} = \Cov(y_i, y_j)$.

In that case, $\chi^2(\v{\theta})$ follows a $\chi^2$-distribution with $k=N-M$ degrees of freedom. In particular:

The goodness-of-fit (level of agreement between the measurements and the model) can be assessed from the minimum value $\chi^2_\min = \chi^2(\hat{\v{\theta}})$.
$\chi^2 = \chi^2_\min + 1$ defines the $1\,\sigma$-isocontours around the maximum-likelyhood estimate $\hat{\v{\theta}}$. Using a Taylor-series quadratic development of $\chi^2(\v{\theta})$ around its minimum, one gets an approximation of the best-fit paramater covariance matrix:

$$ \Cov(\hat{\v{\theta}}) \simeq 2\,\m{H}^{-1} $$

where $\m{H}$ the Hessian matrix of the $\chi^2$ estimated at $\hat{\v{\theta}}$:

\begin{align} H_{ij} &= \left.\frac{\partial^2\chi^2}{\partial\theta_i\partial\theta_j}\right|_{\v{\theta}=\hat{\v{\theta}}} \\ &= 2 \left.\frac{\partial\v{F}}{\partial\theta_i}\right|_{\hat{\v{\theta}}}^T \cdot \m{V}^{-1} \cdot \left.\frac{\partial\v{F}}{\partial\theta_j}\right|_{\hat{\v{\theta}}}. \end{align}

If the measurements are uncorrelated, the covariance matrix $\m{V}$ is purely diagonal, with $V_{ii} = \sigma_i^2$ the (supposedly known) variance of measurement $y_i$, and the previous equations can be rewritten:

\begin{align} \chi^2(\v{\theta}) &= \sum_{i=1}^N \frac{(y_i - F_i(\v{\theta}))^2}{\sigma_i^2}, \\ H_{ij} &= 2 \sum_{k=1}^N \frac{1}{\sigma_k^2} \frac{\partial F_k}{\partial\theta_i}\frac{\partial F_k}{\partial\theta_j}. \end{align}

Example: weighted mean¶

We define two datasets:

an uncorrelated repeated measurement sample $\v{x}$ ($N=100$) of a null-quantity with log-normal measurement uncertainties,
a correlated sample $\v{y}$, derived from $\v{x}$ with an exponential correlation length of $\tau = 5$ px and same measurement uncertainties.

In [12]:

def vec2corr(vec):
    """Define n×n correlation matrix from *vec* of length n, such that
    corr[0,:] = corr[:,0] = vec."""

    n = len(vec)
    tmp = N.concatenate((vec[:0:-1],vec))
    corr = N.array([ tmp[n-(i+1):2*n-(i+1)] for i in range(n) ])

    return corr

n = 100
tau = 5. # Exponential correlation length
N.random.seed(12)
sig = N.sqrt(N.random.lognormal(sigma=0.3, size=n)) # Log-normally distributed uncertainties
cor = vec2corr(N.exp(-N.arange(n)/tau)) # Correlation matrix 
cov = cor * N.outer(sig, sig)           # Covariance matrix
x = N.random.normal(loc=0, scale=sig) # Uncorrelated gaussian measurements
y = N.dot(SL.sqrtm(cor), x)           # Correlated gaussian measurements

P.errorbar(N.arange(n), x, yerr=sig, fmt='b.', label='x: uncorrelated')
P.errorbar(N.arange(n), y, yerr=sig, fmt='rs', label='y: correlated')
P.legend(loc='best')

Out[12]:

<matplotlib.legend.Legend at 0xb70cff0c>

The goodness-of-fit with respect to constant value $\mu = 0$ is given by:

In [13]:

icov = SL.pinvh(cov)

def chi2(mu, x):
    """Estimate chi2 *without* accounting for covariance."""
    return N.sum(((x - mu)/sig)**2)

def chi2Cov(mu, y):
    """Estimate chi2 accounting for covariance."""
    return N.dot(y - mu, N.dot(icov, y - mu))

print "DoF: k=%d" % n
c = chi2(0, x)                   # χ²
p = SS.chisqprob(c, n)           # p-value
z = SS.distributions.norm.isf(p) # z-value
print "x (uncorrelated):         χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)
c = chi2(0, y)                   # No account for covariance
p = SS.chisqprob(c, n)           # p-value
z = SS.distributions.norm.isf(p) # z-value
print "y (correlated, w/o cov.): χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)
c = chi2Cov(0, y)                # Account of covariance
p = SS.chisqprob(c, n)           # p-value
z = SS.distributions.norm.isf(p) # z-value
print "y (correlated, w/ cov.):  χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)

DoF: k=100
x (uncorrelated):         χ² = 110.02, p-value=0.232 (0.7-sigma)
y (correlated, w/o cov.): χ² = 136.56, p-value=0.009 (2.4-sigma)
y (correlated, w/ cov.):  χ² = 115.88, p-value=0.132 (1.1-sigma)

In this particular case, the fact of disregarding the covariance for the correlated dataset leads to a spurious 2.4-sigma tension with the true average $\mu = 0$.

The objective is to derive the maximum-likelihood estimate of the average values, the model is simply the 1-parameter function $\v{F} = (\mu, \ldots, \mu)$. In presence of a covariance matrix $\m{V}$, one has then $\sigma_{\hat{\mu}}^2 = \left(\sum_{ij} (\m{V}^{-1})_{ij}\right)^{-1}$. For uncorrelated measurements, this gives $\sigma_{\hat{\mu}}^2 = \left(\sum_{i} 1/\sigma_i^2\right)^{-1}$ (uncertainty on variance weighted mean, see above).

In [14]:

mus = N.linspace(-1, 1, 1000)
print "DoF: k=%d" % (n-1)

def p2z(p):
    """Express the input one-sided *p*-value as a sigma equivalent
    significance from a normal distribution (the so-called *z*-value)."""
    
    return N.where(p >= 0.5, N.nan, SS.distributions.norm.isf(p))

xchi2s = [ chi2(mu, x) for mu in mus ]
iminx = N.argmin(xchi2s) # Poor man's minimization!
mux = mus[iminx]
dmux = N.sqrt(1/N.sum(1/sig**2))
print "x ML mean:           µ = %+.4f ± %.4f (%.1f-sigma)" % (mux, dmux, N.abs(mux)/dmux)
c = chi2(mux, x)                 # χ²
p = SS.chisqprob(c, n-1)         # p-value
z = p2z(p)                       # z-value
print "                     χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)

ychi2s = [ chi2(mu, y) for mu in mus ]
iminy = N.argmin(ychi2s) # Poor man's minimization!
muy = mus[iminy]
dmuy = N.sqrt(1/N.sum(1/sig**2))
print "y ML mean (w/o cov): µ = %+.4f ± %.4f (%.1f-sigma)" % (muy, dmuy, N.abs(muy)/dmuy)
c = chi2(muy, y)                 # No account for covariance
p = SS.chisqprob(c, n-1)         # p-value
z = p2z(p)                       # z-value
print "                     χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)

ychi2Covs = [ chi2Cov(mu, y) for mu in mus ]
iminyCov = N.argmin(ychi2Covs) # Poor man's minimization!
muyCov = mus[iminyCov]
dmuyCov = N.sqrt(1/N.sum(icov)) # Simple expression of dµ derived from model
print "y ML mean (w/ cov):  µ = %+.4f ± %.4f (%.1f-sigma)" % (muyCov, dmuyCov, N.abs(muyCov)/dmuyCov)
c = chi2Cov(muyCov, y)           # Account of covariance
p = SS.chisqprob(c, n-1)         # p-value
z = p2z(p)                       # z-value
print "                     χ² = %.2f, p-value=%.3f (%.1f-sigma)" % (c, p, z)

DoF: k=99
x ML mean:           µ = -0.1872 ± 0.0955 (2.0-sigma)
                     χ² = 106.20, p-value=0.292 (0.5-sigma)
y ML mean (w/o cov): µ = -0.6777 ± 0.0955 (7.1-sigma)
                     χ² = 86.06, p-value=0.820 (nan-sigma)
y ML mean (w/ cov):  µ = -0.2993 ± 0.1979 (1.5-sigma)
                     χ² = 113.60, p-value=0.150 (1.0-sigma)

The effect of neglecting the covariance in correlated data is even stronger on ML-estimate of mean: the estimated value is more than 7-sigma away from the true value, with an unrealistically good $\chi^2_\min$.

In [16]:

P.plot(mus, xchi2s, 'g:')
P.plot(mus, ychi2s, 'r--')
P.plot(mus, ychi2Covs, 'b-')
P.errorbar([mux],[xchi2s[iminx]], xerr=dmux, fmt='g^', label='UNcorrelated')
P.errorbar([muy],[ychi2s[iminy]], xerr=dmuy, fmt='rs', label='Correlated (w/o cov)')
P.errorbar([muyCov],[ychi2Covs[iminyCov]], xerr=dmuyCov, fmt='bo', label='Correlated (w/ cov)')
P.xlabel(u"µ")
P.ylabel(u"χ²")
P.legend(loc='best', numpoints=1)
P.grid()

Propagation of uncertainties¶

Consider a set of $N$ quantities (measurements, parameter estimates, etc.) $\v{y} = (y_1, \ldots y_N)$ associated to a covariance matrix $V_{ij} = \Cov(y_i, y_j)$ (estimated from measurement or adjustment procedures), and a set of $M$ functions $\v{f} = (f_1, \ldots, f_M)$. The covariance matrix among the propagated values $\v{f}(\v{y})$ can be estimated from a Taylor expansion around $\v{y}$. To first order, one has: $$ \Cov(\v{f}(\v{y})) \simeq \m{J}\cdot\m{V}\cdot{\m{J}}^T $$ where the matrix of derivative ("jacobian") is given by: $J_{ij} = \left.\frac{\partial f_i}{\partial y_j}\right|_{\v{y}}$.

This 1st-order expression is exact if $\v{f}$ is linear: $\v{f}(\v{y}) = \m{J}\cdot\v{y}$. If this is not the case, one has to make sure the functions are reasonably linear over the domain $\v{y} \pm \v{\sigma}_{\v{y}}$, or expand the Taylor series to higher orders.

The 1st-order expression be rewritten in a non-matricial form: $$ \Cov(f_i, f_j) \simeq \sum_{k,l = 1}^{N} \frac{\partial f_i}{\partial y_k}\frac{\partial f_j}{\partial y_l} V_{kl}. $$ In the case of uncorrelated input quantities, $V_{ij} = \sigma_i^2 \delta_{ij}$ and one recovers the traditional uncertainty propagation expression: $$ \sigma_{f_i}^2 = \sum_{k=1}^{N} \left(\frac{\partial f_i}{\partial y_k}\right)^2 \sigma_k^2. $$

Example: synthetic photometry¶

Consider a spectrum made of $N$ flux measurements $\v{y} = (y_1, \ldots y_N)$ associated to a covariance matrix $V_{ij} = \Cov(y_i, y_j)$. One wants to compute the synthetic photometry (i.e. integrated flux) in bands $A$ and $B$, \begin{align} A &= \sum_i \alpha_i y_i \\ B &= \sum_i \beta_i y_i \end{align} where the fixed values of $\alpha$ and $\beta$ coefficients define the filter bandpass, as well as resulting variances and covariances.

In this simple linear case, fluctuations $\delta A$ are directly related to flux fluctuations $\delta y_i$: $$ \delta A = \sum_i \alpha_i \delta y_i $$ and therefore \begin{align} \sigma_A^2 &= \langle \delta^2 A \rangle \\ &= \sum_{i,j} \alpha_i \alpha_j \langle \delta y_i \delta y_j \rangle \\ &= \v{\alpha} \cdot \m{V} \cdot \v{\alpha}^T \\ \Cov(A, B) &= \langle \delta A \delta B \rangle \\ &= \sum_{i,j} \alpha_i \beta_j \langle \delta y_i \delta y_j \rangle \\ &= \v{\alpha} \cdot \m{V} \cdot \v{\beta}^T \end{align}

Regression coefficient¶

Correlation coefficients measure the relationship between two datasets $\{x_{i}\}$ and $\{y_{j}\}$.

Common Statistical Tools¶

Sample statistics¶

Maximum-likelihood estimators¶

Sample location¶

Sample scale¶

Bayesian estimates¶

Robust estimators¶

Location¶

Scale¶

Covariance matters¶

$\chi^2$ minimisation¶

Example: weighted mean¶

Propagation of uncertainties¶

Example: synthetic photometry¶

Regression coefficient¶

Plot tests¶

Histograms¶

Model adjustment¶

Linear regression¶

Least-squares¶

Chi2¶

Cash statistics¶

Likelihood¶

Resampling techniques¶

Pull¶

References¶