Notebook

In a previous post I compared several parameterizations for estimating the covariance parameter of latent exchangeable normal random variables using PyMC. These parameterizations were necesssary because for quite some time before the official release of PyMC version 4.0, the beta lacked support for multivariate random variables. Support for these distributions has since been added, rendering the workaround parameterizations in that post superfluous. Even so, after writing that post I remained curious about the closed-form solution for the Cholesky decomposition of the covariance matrix for exchangeable normal random variables, and eventually was able to derive it. This post presents that closed-form solution along with a numerical verification of its correctness.

Recall that a vector of random variables is exchangeable if every permutation of its entries has the same joint distribution. A set of exchangeable normal random variables, $X_1, \ldots, X_T \sim N(\mu, \sigma^2)$ is parameterized by their marginal mean, $\mu$, marginal scale, $\sigma$, and the Pearson correlation coefficient $$\rho = \frac{\mathbb{Cov}(X_i, X_j)}{\sigma^2}.$$

In this post we will assume $\mu = 0$ and $\sigma = 1$ for simplicity; the results are straightforward to generalize. With this assumption, the covariance matrix is

$$ \Sigma_{\rho} = \begin{pmatrix} 1 & \rho & \cdots & \rho & \rho \\ \rho & 1 & \cdots & \rho & \rho \\ \rho & \rho & \cdots & 1 & \rho \\ \rho & \rho & \cdots & \rho & 1 \end{pmatrix} \in \mathbb{R}^{T \times T}. $$

The previous post used SymPy to calculate the Cholesky decomposition of this matrix for small values of $T$. It was obvious that there was a pattern, but the we did not pursue it at the time, as the SymPy factorization could be lambdified and the result applied to Aesara tensors to facilitate inference in PyMC. This post will prove and numerically verify the following closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}.$

Let $-\frac{1}{T - 1} \leq \rho < 1$. Define $d_1 = 1$ and $\ell_1 = \rho$. For $1 < j \leq T$, let $$d_j = \sqrt{d_{j - 1}^2 - \ell_{j - 1}^2}$$ and $$\ell_j = \frac{\rho - 1}{d_j} + d_j.$$

Finally, let $$ L_T = \begin{pmatrix} d_1 & 0 & 0 & \cdots & 0 \\ \ell_1 & d_2 & 0 & \cdots & 0 \\ \ell_1 & \ell_2 & d_3 & \cdots & 0 \\ & & & \ddots & \\ \ell_1 & \ell_2 & \ell_3 & \cdots & d_T \end{pmatrix} \in \mathbb{R}^{T \times T}. $$

Claim $\Sigma_T = L_T L_T^{\top}$.

Proof (by induction):

When $T = 1$ or $T = 2$, the claim is easily verified manually.

Assume, for all $1 \leq n \leq T$, that $\Sigma_n = L_n L_n^{\top}$.

It will be useful to block partition $L_n$ as $$L_n = \begin{pmatrix} L_{n - 1} & 0 \\ v_n & d_n \end{pmatrix},$$ where $v_n = \begin{pmatrix}\ell_1 & \ell_2 & \cdots & \ell_{n - 1}\end{pmatrix}$ are the vector of lower triangular elements of $L_n$. The inductive hypothesis then becomes $$\begin{align*} \Sigma_n & = L_n L_n^{\top} \\ & = \begin{pmatrix} L_{n - 1} & 0 \\ v_n & d_n \end{pmatrix} \begin{pmatrix} L_{n - 1}^{\top} & v_n^{\top} \\ 0 & d_n \end{pmatrix} \\ & = \begin{pmatrix} \Sigma_{n - 1} & L_{n - 1} v_n^{\top} \\ v_n L_{n - 1}^{\top} & v_n v_n^{\top} + d_n^2 \end{pmatrix}. \end{align*}$$ Equating the bottom right element of these two matrices gives $$1 = (\Sigma_n)_{n, n} = v_n v_n^{\top} + d_n^2.$$ Restated, we have that $v_n v_n^{\top} = 1 - d_n^2$. Equating the off-diagonal elements in the final column gives, for $1 \leq i < n$, $$\rho = (\Sigma_n)_{i, n} = (L_{n - 1} v_n^{\top})_i,$$ so $L_{n - 1} v_n^{\top} = \rho \vec{1}$.

Now, $$L_{T + 1} L_{T + 1}^{\top} = \begin{pmatrix} \Sigma_T & L_T v_{T + 1}^{\top} \\ v_{T + 1} L_T^{\top} & v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 \end{pmatrix}, $$ so it suffices to show that $$L_T v_{T + 1}^{\top} = \rho \vec{1}$$ and $$v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 = 1.$$

First, we have that $$\begin{align*} L_T v_{T + 1}^{\top} & = \begin{pmatrix} L_{T - 1} & 0\\ v_T & d_T \end{pmatrix} \begin{pmatrix} v_T^{\top} \\ \ell_T \end{pmatrix} \\ & = \begin{pmatrix} L_{T - 1} v_T^{\top} \\ v_T v_T^{\top} + d_T \cdot \ell_T \end{pmatrix}. \end{align*}$$ From the inductive hypothesis, $L_{T - 1} v_T^{\top} = \rho \vec{1}$. For the final entry, $$\begin{align*} v_T v_T^{\top} + d_T \cdot \ell_T & = 1 - d_T^2 + d_T \left(\frac{\rho - 1}{d_T} + d_T\right) \\ & = \rho. \end{align*}$$

Second, we have that $$\begin{align*} v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 & = v_T v_T^{\top} + \ell_T^2 + d_{T + 1}^2 \\ & = 1 - d_T^2 + \ell_T^2 + \left(\sqrt{d_T^2 - \ell_T^2}\right)^2 \\ & = 1. \end{align*}$$

QED

To validate these calculations numerically we will use Hypothesis, a Python library for generative testing. First we make the necessary Python imports.

In [1]:

from hypothesis import assume, given
from hypothesis.strategies import integers, floats

In [2]:

import numpy as np

The following function generates the covariance matrix, $\Sigma_{\rho}$, for given values of $\rho$ and $T$.

In [3]:

def get_cov_mat(ρ, T):
    return np.eye(T) + ρ * (1 - np.eye(T))

The next function implements the closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}$ presented above.

In [4]:

def get_cov_mat_chol(ρ, T):
    diag = np.empty(T)
    tril = np.empty(T)
    
    diag[0] = 1
    tril[0] = ρ

    for j in range(1, T):
        diag[j] = np.sqrt(diag[j - 1]**2 - tril[j - 1]**2)
        tril[j] = (ρ - 1) / diag[j] + diag[j] if j < T - 1 else 0
            
    return np.diag(diag) + np.tril(tril, k=-1)

This final function is a Hypothesis test that verifies that we have, in fact, computed the Cholesky decomposition of $\Sigma_{\rho}$.

In [5]:

@given(floats(-0.5, 1), integers(2, 1000))
def test_cov_mat_chol(ρ, T):
    assume(-1 / (T - 1) <= ρ < 1)
    
    Σ = get_cov_mat(ρ, T)
    chol = get_cov_mat_chol(ρ, T)

    # check that chol is lower diagonal
    assert (chol == np.tril(chol)).all()
    
    # check that "chol is a factorization of Σ
    assert np.allclose(Σ, chol @ chol.T)

The given decorator tells Hypothesis to generate floating point values in the interval $\left[-\frac{1}{2}, 1\right]$ for $\rho$ and integers between 2 and 1,000 for $T$. The upper bound on values generated for $T$ is not necessary but ensures the test runs in a reasonable amount of time. The assume call ensures that $\rho$ is in the appropriate interval based on the size of the random vector. Given this guidance, Hypothesis will generate random values of $\rho$ and $T$ to run the test with.

In [6]:

test_cov_mat_chol()

The test passes, numerically validating our closed-form solution for the Cholesky decomposition.

This post is available as a Jupyter notebook here.

In [7]:

%load_ext watermark
%watermark -n -u -v -iv -p hypothesis

Last updated: Sun Sep 25 2022

Python implementation: CPython
Python version       : 3.10.6
IPython version      : 8.4.0

hypothesis: 6.54.6

numpy: 1.23.2