#!/usr/bin/env python
# coding: utf-8

# In a [previous post](https://austinrochford.com/posts/exch-param-pymc.html) I compared several parameterizations for estimating the covariance parameter of latent [exchangeable normal random variables](https://en.wikipedia.org/wiki/Exchangeable_random_variables) using [PyMC](http://pymc.io/).  These parameterizations were necesssary because for quite some time before the [official release of PyMC version 4.0](https://www.pymc.io/blog/v4_announcement.html), the beta lacked support for multivariate random variables.  Support for these distributions has since [been added](https://www.pymc.io/projects/docs/en/stable/api/distributions/multivariate.html), rendering the workaround parameterizations in that post superfluous.  Even so, after writing that post I remained curious about the closed-form solution for the [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of the covariance matrix for exchangeable normal random variables, and eventually was able to derive it.  This post presents that closed-form solution along with a numerical verification of its correctness.
# 
# Recall that a vector of random variables is exchangeable if every permutation of its entries has the same joint distribution. A set of exchangeable normal random variables, $X_1, \ldots, X_T \sim N(\mu, \sigma^2)$ is parameterized by their marginal mean, $\mu$, marginal scale, $\sigma$, and the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)
# $$\rho = \frac{\mathbb{Cov}(X_i, X_j)}{\sigma^2}.$$
# 
# In this post we will assume $\mu = 0$ and $\sigma = 1$ for simplicity; the results are straightforward to generalize.  With this assumption, the covariance matrix is
# 
# $$
# \Sigma_{\rho} = \begin{pmatrix}
#     1 & \rho & \cdots & \rho & \rho \\
#     \rho & 1 & \cdots & \rho & \rho \\
#     \rho & \rho & \cdots & 1 & \rho \\
#     \rho & \rho & \cdots & \rho & 1
# \end{pmatrix} \in \mathbb{R}^{T \times T}.
# $$
# 
# The previous post used [SymPy](https://www.sympy.org/en/index.html) to calculate the Cholesky decomposition of this matrix for small values of $T$.  It was obvious that there was a pattern, but the we did not pursue it at the time, as the SymPy factorization could be [lambdified](https://docs.sympy.org/latest/modules/utilities/lambdify.html) and the result applied to [Aesara](https://aesara.readthedocs.io/en/latest/) tensors to facilitate inference in PyMC.  This post will prove and numerically verify the following closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}.$
# 
# Let $-\frac{1}{T - 1} \leq \rho < 1$. Define $d_1 = 1$ and $\ell_1 = \rho$.  For $1 < j \leq T$, let
# $$d_j = \sqrt{d_{j - 1}^2 - \ell_{j - 1}^2}$$
# and
# $$\ell_j = \frac{\rho - 1}{d_j} + d_j.$$
# 
# Finally, let
# $$
# L_T = \begin{pmatrix}
# d_1 & 0 & 0 & \cdots & 0 \\
# \ell_1 & d_2 & 0 & \cdots & 0 \\
# \ell_1 & \ell_2 & d_3 & \cdots & 0 \\
#  & & & \ddots & \\
# \ell_1 & \ell_2 & \ell_3 & \cdots & d_T
# \end{pmatrix}
# \in \mathbb{R}^{T \times T}.
# $$
# 
# **Claim** $\Sigma_T = L_T L_T^{\top}$.
# 
# **Proof (by induction):**
# 
# When $T = 1$ or $T = 2$, the claim is easily verified manually.
# 
# Assume, for all $1 \leq n \leq T$, that $\Sigma_n = L_n L_n^{\top}$.
# 
# It will be useful to block partition $L_n$ as
# $$L_n = \begin{pmatrix}
# L_{n - 1} & 0 \\
# v_n & d_n
# \end{pmatrix},$$
# where $v_n = \begin{pmatrix}\ell_1 & \ell_2 & \cdots & \ell_{n - 1}\end{pmatrix}$ are the vector of lower triangular elements of $L_n$.  The inductive hypothesis then becomes
# $$\begin{align*}
# \Sigma_n
#     & = L_n L_n^{\top} \\
#     & = \begin{pmatrix}
#         L_{n - 1} & 0 \\
#         v_n & d_n
#     \end{pmatrix} \begin{pmatrix}
#         L_{n - 1}^{\top} & v_n^{\top} \\
#         0 & d_n
#     \end{pmatrix} \\
#     & = \begin{pmatrix}
#         \Sigma_{n - 1} & L_{n - 1} v_n^{\top} \\
#         v_n L_{n - 1}^{\top} & v_n v_n^{\top} + d_n^2
#     \end{pmatrix}.
# \end{align*}$$
# Equating the bottom right element of these two matrices gives
# $$1 = (\Sigma_n)_{n, n} = v_n v_n^{\top} + d_n^2.$$
# Restated, we have that $v_n v_n^{\top} = 1 - d_n^2$.
# Equating the off-diagonal elements in the final column gives, for $1 \leq i < n$,
# $$\rho = (\Sigma_n)_{i, n} = (L_{n - 1} v_n^{\top})_i,$$
# so $L_{n - 1} v_n^{\top} = \rho \vec{1}$.
# 
# Now,
# $$L_{T + 1} L_{T + 1}^{\top} = \begin{pmatrix}
#     \Sigma_T & L_T v_{T + 1}^{\top} \\
#     v_{T + 1} L_T^{\top} & v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2
# \end{pmatrix},
# $$
# so it suffices to show that
# $$L_T v_{T + 1}^{\top} = \rho \vec{1}$$
# and
# $$v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2 = 1.$$
# 
# First, we have that
# $$\begin{align*}
# L_T v_{T + 1}^{\top}
#     & = \begin{pmatrix}
#         L_{T - 1} & 0\\
#         v_T & d_T
#     \end{pmatrix} \begin{pmatrix}
#         v_T^{\top} \\
#         \ell_T
#     \end{pmatrix} \\
#     & = \begin{pmatrix}
#         L_{T - 1} v_T^{\top} \\
#         v_T v_T^{\top} + d_T \cdot \ell_T
#     \end{pmatrix}.
# \end{align*}$$
# From the inductive hypothesis, $L_{T - 1} v_T^{\top} = \rho \vec{1}$.  For the final entry,
# $$\begin{align*}
# v_T v_T^{\top} + d_T \cdot \ell_T
#     & = 1 - d_T^2 + d_T \left(\frac{\rho - 1}{d_T} + d_T\right) \\
#     & = \rho.
# \end{align*}$$
# 
# Second, we have that
# $$\begin{align*}
# v_{T + 1} v_{T + 1}^{\top} + d_{T + 1}^2
#     & = v_T v_T^{\top} + \ell_T^2 + d_{T + 1}^2 \\
#     & = 1 - d_T^2 + \ell_T^2 + \left(\sqrt{d_T^2 - \ell_T^2}\right)^2 \\
#     & = 1.
# \end{align*}$$
# 
# **QED**

# To validate these calculations numerically we will use [Hypothesis](https://hypothesis.readthedocs.io/en/latest/), a Python library for generative testing.  First we make the necessary Python imports.

# In[1]:


from hypothesis import assume, given
from hypothesis.strategies import integers, floats


# In[2]:


import numpy as np


# The following function generates the covariance matrix, $\Sigma_{\rho}$, for given values of $\rho$ and $T$.

# In[3]:


def get_cov_mat(ρ, T):
    return np.eye(T) + ρ * (1 - np.eye(T))


# The next function implements the closed-form solution for the Cholesky decomposition of $\Sigma_{\rho}$ presented above.

# In[4]:


def get_cov_mat_chol(ρ, T):
    diag = np.empty(T)
    tril = np.empty(T)
    
    diag[0] = 1
    tril[0] = ρ

    for j in range(1, T):
        diag[j] = np.sqrt(diag[j - 1]**2 - tril[j - 1]**2)
        tril[j] = (ρ - 1) / diag[j] + diag[j] if j < T - 1 else 0
            
    return np.diag(diag) + np.tril(tril, k=-1)


# This final function is a Hypothesis test that verifies that we have, in fact, computed the Cholesky decomposition of $\Sigma_{\rho}$.

# In[5]:


@given(floats(-0.5, 1), integers(2, 1000))
def test_cov_mat_chol(ρ, T):
    assume(-1 / (T - 1) <= ρ < 1)
    
    Σ = get_cov_mat(ρ, T)
    chol = get_cov_mat_chol(ρ, T)

    # check that chol is lower diagonal
    assert (chol == np.tril(chol)).all()
    
    # check that "chol is a factorization of Σ
    assert np.allclose(Σ, chol @ chol.T)


# The [`given`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.given) decorator tells Hypothesis to generate floating point values in the interval $\left[-\frac{1}{2}, 1\right]$ for $\rho$ and integers between 2 and 1,000 for $T$.  The upper bound on values generated for $T$ is not necessary but ensures the test runs in a reasonable amount of time.  The [`assume`](https://hypothesis.readthedocs.io/en/latest/details.html#hypothesis.assume) call ensures that $\rho$ is in the appropriate interval based on the size of the random vector.  Given this guidance, Hypothesis will generate random values of $\rho$ and $T$ to run the test with.

# In[6]:


test_cov_mat_chol()


# The test passes, numerically validating our closed-form solution for the Cholesky decomposition.
# 
# This post is available as a Jupyter notebook [here](https://nbviewer.org/gist/AustinRochford/3e6fde7eaf9d5b93d27ba898bdac7f0f).

# In[7]:


get_ipython().run_line_magic('load_ext', 'watermark')
get_ipython().run_line_magic('watermark', '-n -u -v -iv -p hypothesis')