Notebook

Proving $f(X) = -\log\det(X)$ is convex on $\mathbf{dom}(f)=\mathbb{S}^{n}_{++}$¶

Proof 1 (showing post-composition of $f(\cdot)$ with affine function is convex)¶

See text p. 74 bottom.

Proof 2 (showing first order condition for function convexity)¶

In class, we proved that $\displaystyle\frac{\partial f}{\partial X} = -X^{-1}$ for $X\in\mathbb{S}^{n}_{++}$. To establish convexity via first order condition, we need to demonstrate that

$$f(Y) \:\geq\: f(X) \: + \: \Big\langle \displaystyle\frac{\partial f}{\partial X}, Y - X \Big\rangle,$$

where the inner product $\langle A, B \rangle := \text{trace}(A^{\top}B)$. Substituting our $f(\cdot)$ and $\displaystyle\frac{\partial f}{\partial X}$ in the above condition, we are required to prove:

$$\text{trace}\left(X^{-1}Y\right) \:-\: n \:+\: \log\det\left(XY^{-1}\right) \,\geq\, 0.$$

We observe that the LHS of the above inequality is twice the Kullback-Leibler divergence $D_{\rm{KL}}\left(\mathcal{N}\left(0,Y\right) || \mathcal{N}\left(0,X\right)\right)$, where $D_{\rm{KL}}\left(p(x) || q(x)\right)$ between two probability density functions $p(x)$ and $q(x)$ over $x\in\mathbb{R}^{n}$, is defined as $$D_{\rm{KL}}\left(p(x) || q(x)\right) := \displaystyle\int_{\mathbb{R}^{n}} p(x)\log\displaystyle\frac{p(x)}{q(x)}\:{\rm{d}}x \geq 0.$$ That the above is always $\geq 0$, follows from a generalization of Jensen's inequality: for $h,g:\mathbb{R}\mapsto\mathbb{R}$, and $h(\cdot)$ convex, $\mathbb{E}\left[h \circ g(x)\right] \geq h\left(\mathbb{E}\left[g(x)\right]\right)$. You can think of it as Jensen's inequality applied to the random variable $g(x(\omega))$. Specifically, take $h(x) = -\log(x)$ and $g(x) = q(x)/p(x)$ in that generalized Jensen's inequality to get the result.

Proof 3 (showing second order condition for function convexity)¶

We need to compute the Hessian of $f(X)$, which will be $n^{2} \times n^{2}$ matrix. Hessian is the negative Jacobian of the matrix-valued function $F(X):= X^{-1}$, that is, ${\rm{Hess}}(f) = -{\rm{D}}F(X)$, where $F(X) = - \displaystyle\frac{\partial f}{\partial X}$.

Taking differential ${\rm{d}}(\cdot)$ to both sides of the identity $XX^{-1}=I$, we obtain

$${\rm{d}}(X^{-1}) = -X^{-1}\left({\rm{d}}X\right)X^{-1}.$$

Applying ${\rm{vec}}(\cdot)$ operator to both sides of the above, using the fact that the operators ${\rm{vec}}(\cdot)$ and ${\rm{d}}(\cdot)$ commute, and employing the identity ${\rm{vec}}(PQR) = \left(R^{\top}\otimes P\right) {\rm{vec}}(Q)$, we get

$${\rm{d}}({\rm{vec}}(X^{-1})) = - \left(X^{-1}\otimes X^{-1}\right) {\rm{d}}\left({\rm{vec}}(X)\right),$$

which yields ${\rm{D}}F(X) = - \left(X^{-1}\otimes X^{-1}\right)$, and hence ${\rm{Hess}}(f) = -{\rm{D}}F(X) = X^{-1}\otimes X^{-1}$. Consequently, ${\rm{Hess}}(f)$ is positive definite since it has eigenvalues $\left(\lambda_{i}\lambda_{j}\right)^{-1}$ for $i,j=1,...,n$, where $\lambda_{1}, ..., \lambda_{n}$ are the eigenvalues of $X$. Therefore, $f(X)$ is a convex function of $X\in\mathbb{S}^{n}_{++}$.

In [ ]: