Consider the maximum eigenvalue $\lambda_{\max}$ and the minimum eigenvalue $\lambda_{\min}$ for any $X\in\mathbb{S}^{n}$.
The purpose of this exercise is to investigate whether $\lambda_{\max}(X)$ and $\lambda_{\min}(X)$ are convex functions of the matrix variable $X\in\mathbb{S}^{n}$.
For any $x\in\mathbb{R}^{n}$ and any $X\in\mathbb{S}^{n}$, derive the inequality:
$$\lambda_{\min}(X) \leq \dfrac{x^{\top}Xx}{x^{\top}x} \leq \lambda_{\max}(X).$$(Hint: start from the spectral theorem applied to a symmetric matrix.)
Use part (a), to prove the following two formula:
$$\lambda_{\max}(X) = \underset{\|x\|_{2}=1}{\sup} x^{\top}X x, \qquad \lambda_{\min}(X) = \underset{\|x\|_{2}=1}{\inf} x^{\top}X x, \qquad x\in\mathbb{R}^{n}, \quad X\in\mathbb{S}^{n}.$$(Hint: think if and when the equalities are achieved in the weak inequality derived in part (a).)
Use part (b) to argue which of the two functions $f(X) := \lambda_{\max}(X)$ and $g(X) = \lambda_{\min}(X)$ is/are convex over $\textbf{dom}(f) = \textbf{dom}(g) = \mathbb{S}^{n}$. Explain.
(Hint: think operations preserving function convexity/concavity from Lec. 8.)
Consider the support function $h(y)$ of any compact convex set introduced in HW3, Prob. 1(b).
Is the support function convex or concave or neither? Give reasons to explain your answer.
(Hint: again think operations preserving function convexity/concavity from Lec. 8.)
(i) What is the function $f(x)$ whose convex conjugate equals the support function $h(y)$? Give reasons to explain your answer.
(ii) What is the geometric interpretation of your result in 2(b)(i)?
(i) Is the epigraph of support function: $\text{epi}(h)$ a cone? Why/why not?
(ii) Is $\text{epi}(h)$ a convex cone? Why/why not?
Given a strictly convex differentiable function $\psi$ over a closed convex set $\mathcal{X}$, the associated Bregman divergence $D_{\psi}:\mathcal{X}\times\mathcal{X}\mapsto\mathbb{R}_{\geq 0}$ is defined as
$$D_{\psi}(x,y) := \psi(x) - \psi(y) - \langle\nabla\psi(y), x-y\rangle, \qquad\forall x,y\in\mathcal{X}.$$The above formula has a clean geometric meaning: the Bregman divergence quantifies the amount by which a strictly convex function lies above its linear approximation (tangent hyperplane).
For example, when $\psi(x) = \|x\|_{2}^{2}$, then for any closed $\mathcal{X}\subset\mathbb{R}^{n}$, we have $D_{\psi}(x,y) = \|x-y\|_{2}^{2}$, the squared Euclidean distance. (It is a good idea to verify this yourself!)
As another example, when $\psi(x) = \sum_{i=1}^{n}x_{i}\log x_{i}$, $\mathcal{X}\equiv \{x\in\mathbb{R}^{n}_{\geq 0}\mid \sum_{i=1}^{n}x_{i}=1\}$ (the standard simplex, see Lec. 5, p. 15), then $D_{\psi}(x,y) = \sum_{i=1}^{n}x_{i}\log(x_{i}/y_{i})$, the Kullback-Leibler divergence/relative entropy. (Again, good idea to verify this yourself!)
Even though $D_{\psi}(x,y)\neq D_{\psi}(y,x)$ in general, the Bregman divergence has many nice properties, and appears frequently in the machine learning applications.
(i) Is $D_{\psi}(x,y)$ strictly convex in $x$? Why/why not?
(ii) If $\psi$ is $m$-strongly convex, then prove that $D_{\psi}(x,y) \geq \frac{m}{2}\|x-y\|_{2}^{2}$, and that $D_{\psi}(x,y)$ is $m$-strongly convex in $x$.
(iii) Construct a counterexample to demonstrate that $D_{\psi}(x,y)$ need not be convex in its second argument $y$.
The Bregman divergence $D_{\psi}(x,y)$ behaves like the square of a generalized distance between elements in $\mathcal{X}$.
(i) Prove that $$\nabla_{x}D_{\psi}(x,y) = \nabla_{x}\psi(x) - \nabla_{y}\psi(y),$$
where the subscript of the gradient operator $\nabla$ shows the gradient is taken with respect to which vector.
(ii) Prove the generalized law of cosines:
$$D_{\psi}(x,y) = D_{\psi}(x,z) + D_{\psi}(z,y) - \langle x-z, \nabla\psi(y) - \nabla\psi(z)\rangle, \qquad \forall x,y,z\in\mathcal{X}.$$Remark: Notice that for $\psi(x) = \|x\|_{2}^2$, this reduces to $\|x-y\|_{2}^{2} = \|x-z\|_{2}^{2} + \|z-y\|_{2}^{2} - 2\underbrace{\langle x-z, y-z\rangle}_{\|x-z\|_{2}\|y-z\|_{2}\cos\angle yzx}$.
(iii) Recall that $\mathcal{X}\equiv\textbf{dom}(\psi)$. Consider a closed convex set $\mathcal{C}$ such that $\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)\neq\emptyset$. For notational convenience, introduce $\mathcal{A}:=\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)$, which is a convex set.
Define the Bregman projection of $y$ onto $\mathcal{A}$ with respect to $\psi$ as
$${\rm{proj}}_{\mathcal{A}}\left(y\right) := \underset{a\in\mathcal{A}}{\arg\min} \,D_{\psi}\left(a, y\right). \qquad $$Using parts (i)-(ii), prove the genralized Pythagorean theorem: $$D_{\psi}(x,y) \geq D_{\psi}(x, {\rm{proj}}_{\mathcal{A}}\left(y\right)) \: + \: D_{\psi}( {\rm{proj}}_{\mathcal{A}}\left(y\right),y).$$
Hint: If $a^{\text{opt}}$ is a local minimizer of the (possibly nonconvex) function $f(a)$ over $\mathcal{A}$, then the directional derivative $$\bigg\langle a - a^{\text{opt}}, \nabla_{a} f(a)\big\vert_{a=a^{\text{opt}}}\bigg\rangle \geq 0, \qquad \forall a\in\mathcal{A}.$$ If $f$ is convex over $\mathcal{A}$, then the above inequality is also sufficient to guarantee that $a^{\text{opt}}$ is a minimizer.
Remark: In the generalized Pythagorean theorem, the equality is achieved when the set $\mathcal{A}$ is affine.