#!/usr/bin/env python # coding: utf-8 # In a [previous post](https://austinrochford.com/posts/exch-param-pymc.html) I compared several parameterizations for estimating the covariance parameter of latent [exchangeable normal random variables](https://en.wikipedia.org/wiki/Exchangeable_random_variables) using [PyMC](http://pymc.io/). These parameterizations were necesssary because for quite some time before the [official release of PyMC version 4.0](https://www.pymc.io/blog/v4_announcement.html), the beta lacked support for multivariate random variables. Support for these distributions has since [been added](https://www.pymc.io/projects/docs/en/stable/api/distributions/multivariate.html), rendering the workaround parameterizations in that post superfluous. Even so, after writing that post I remained curious about the closed-form solution for the [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of the covariance matrix for exchangeable normal random variables, and eventually was able to derive it. This post presents that closed-form solution along with a numerical verification of its correctness. # # Recall that a vector of random variables is exchangeable if every permutation of its entries has the same joint distribution. A set of exchangeable normal random variables, $X_1, \ldots, X_T \sim N(\mu, \sigma^2)$ is parameterized by their marginal mean, $\mu$, marginal scale, $\sigma$, and the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) # $$\rho = \frac{\mathbb{Cov}(X_i, X_j)}{\sigma^2}.$$ # # In this post we will assume $\mu = 0$ and $\sigma = 1$ for simplicity; the results are straightforward to generalize. With this assumption, the covariance matrix is # # $$ # \Sigma_{\rho} = \begin{pmatrix} # 1 & \rho & \cdots & \rho & \rho \\ # \rho & 1 & \cdots & \rho & \rho \\ # \rho & \rho & \cdots & 1 & \rho \\ # \rho & \rho & \cdots & \rho & 1 # \end{pmatrix} \in \mathbb{R}^{T \times T}. # $$ # # The previous post used [SymPy](https://www.sympy.org/en/index.html) to calculate the Cholesky decomposition of this matrix for small values of $T$. It was obvious that there was a pattern, but the we did not pursue it at the time, as the SymPy factorization could be [lambdified](https://docs.sympy.org/latest/modules/utilities/lambdify.html) and the result applied to [Aesara](https://aesara.readthedocs.io/en/latest/) tensors to facilitate inference in PyMC. This post will prove and numerically verify the following closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}.$ # # Let $-\frac{1}{T - 1} \leq \rho < 1$. Define $d_1 = 1$ and $\ell_1 = \rho$. For $1 < j \leq T$, let # $$d_j = \sqrt{d_{j - 1}^2 - \ell_{j - 1}^2}$$ # and # $$\ell_j = \frac{\rho - 1}{d_j} + d_j.$$ # # Finally, let # $$ # L_T = \begin{pmatrix} # d_1 & 0 & 0 & \cdots & 0 \\ # \ell_1 & d_2 & 0 & \cdots & 0 \\ # \ell_1 & \ell_2 & d_3 & \cdots & 0 \\ # & & & \ddots & \\ # \ell_1 & \ell_2 & \ell_3 & \cdots & d_T # \end{pmatrix} # \in \mathbb{R}^{T \times T}. # $$ # # **Claim** $\Sigma_T = L_T L_T^{\top}$. # # **Proof (by induction):** # # When $T = 1$ or $T = 2$, the claim is easily verified manually. # # Assume, for all $1 \leq n \leq T$, that $\Sigma_n = L_n L_n^{\top}$. # # It will be useful to block partition $L_n$ as # $$L_n = \begin{pmatrix} # L_{n - 1} & 0 \\ # v_n & d_n # \end{pmatrix},$$ # where $v_n = \begin{pmatrix}\ell_1 & \ell_2 & \cdots & \ell_{n - 1}\end{pmatrix}$ are the vector of lower triangular elements of $L_n$. The inductive hypothesis then becomes # $$\begin{align*} # \Sigma_n # & = L_n L_n^{\top} \\ # & = \begin{pmatrix} # L_{n - 1} & 0 \\ # v_n & d_n # \end{pmatrix} \begin{pmatrix} # L_{n - 1}^{\top} & v_n^{\top} \\ # 0 & d_n # \end{pmatrix} \\ # & = \begin{pmatrix} # \Sigma_{n - 1} & L_{n - 1} v_n^{\top} \\ # v_n L_{n - 1}^{\top} & v_n v_n^{\top} + d_n^2 # \end{pmatrix}. # \end{align*}$$ # Equating the bottom right element of these two matrices gives # $$1 = (\Sigma_n)_{n, n} = v_n v_n^{\top} + d_n^2.$$ # Restated, we have that $v_n v_n^{\top} = 1 - d_n^2$. # Equating the off-diagonal elements in the final column gives, for $1 \leq i < n$, # $$\rho = (\Sigma_n)_{i, n} = (L_{n - 1} v_n^{\top})_i,$$ # so $L_{n - 1} v_n^{\top} = \rho \vec{1}$. # # Now, # $$L_{T + 1} L_{T + 1}^{\top} = \begin{pmatrix} # \Sigma_T & L_T v_{T + 1}^{\top} \\ # v_{T + 1} L_T^{\top} & v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 # \end{pmatrix}, # $$ # so it suffices to show that # $$L_T v_{T + 1}^{\top} = \rho \vec{1}$$ # and # $$v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 = 1.$$ # # First, we have that # $$\begin{align*} # L_T v_{T + 1}^{\top} # & = \begin{pmatrix} # L_{T - 1} & 0\\ # v_T & d_T # \end{pmatrix} \begin{pmatrix} # v_T^{\top} \\ # \ell_T # \end{pmatrix} \\ # & = \begin{pmatrix} # L_{T - 1} v_T^{\top} \\ # v_T v_T^{\top} + d_T \cdot \ell_T # \end{pmatrix}. # \end{align*}$$ # From the inductive hypothesis, $L_{T - 1} v_T^{\top} = \rho \vec{1}$. For the final entry, # $$\begin{align*} # v_T v_T^{\top} + d_T \cdot \ell_T # & = 1 - d_T^2 + d_T \left(\frac{\rho - 1}{d_T} + d_T\right) \\ # & = \rho. # \end{align*}$$ # # Second, we have that # $$\begin{align*} # v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 # & = v_T v_T^{\top} + \ell_T^2 + d_{T + 1}^2 \\ # & = 1 - d_T^2 + \ell_T^2 + \left(\sqrt{d_T^2 - \ell_T^2}\right)^2 \\ # & = 1. # \end{align*}$$ # # **QED** # To validate these calculations numerically we will use [Hypothesis](https://hypothesis.readthedocs.io/en/latest/), a Python library for generative testing. First we make the necessary Python imports. # In[1]: from hypothesis import assume, given from hypothesis.strategies import integers, floats # In[2]: import numpy as np # The following function generates the covariance matrix, $\Sigma_{\rho}$, for given values of $\rho$ and $T$. # In[3]: def get_cov_mat(ρ, T): return np.eye(T) + ρ * (1 - np.eye(T)) # The next function implements the closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}$ presented above. # In[4]: def get_cov_mat_chol(ρ, T): diag = np.empty(T) tril = np.empty(T) diag[0] = 1 tril[0] = ρ for j in range(1, T): diag[j] = np.sqrt(diag[j - 1]**2 - tril[j - 1]**2) tril[j] = (ρ - 1) / diag[j] + diag[j] if j < T - 1 else 0 return np.diag(diag) + np.tril(tril, k=-1) # This final function is a Hypothesis test that verifies that we have, in fact, computed the Cholesky decomposition of $\Sigma_{\rho}$. # In[5]: @given(floats(-0.5, 1), integers(2, 1000)) def test_cov_mat_chol(ρ, T): assume(-1 / (T - 1) <= ρ < 1) Σ = get_cov_mat(ρ, T) chol = get_cov_mat_chol(ρ, T) # check that chol is lower diagonal assert (chol == np.tril(chol)).all() # check that "chol is a factorization of Σ assert np.allclose(Σ, chol @ chol.T) # The [`given`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.given) decorator tells Hypothesis to generate floating point values in the interval $\left[-\frac{1}{2}, 1\right]$ for $\rho$ and integers between 2 and 1,000 for $T$. The upper bound on values generated for $T$ is not necessary but ensures the test runs in a reasonable amount of time. The [`assume`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.assume) call ensures that $\rho$ is in the appropriate interval based on the size of the random vector. Given this guidance, Hypothesis will generate random values of $\rho$ and $T$ to run the test with. # In[6]: test_cov_mat_chol() # The test passes, numerically validating our closed-form solution for the Cholesky decomposition. # # This post is available as a Jupyter notebook [here](https://nbviewer.org/gist/AustinRochford/3e6fde7eaf9d5b93d27ba898bdac7f0f). # In[7]: get_ipython().run_line_magic('load_ext', 'watermark') get_ipython().run_line_magic('watermark', '-n -u -v -iv -p hypothesis')