Notebook

Problem 1 [30 points] Maximum and minimum eigenvalues¶

Consider the maximum eigenvalue $\lambda_{\max}$ and the minimum eigenvalue $\lambda_{\min}$ for any $X\in\mathbb{S}^{n}$.

The purpose of this exercise is to investigate whether $\lambda_{\max}(X)$ and $\lambda_{\min}(X)$ are convex functions of the matrix variable $X\in\mathbb{S}^{n}$.

(a) [10 points] An inequality¶

For any $x\in\mathbb{R}^{n}$ and any $X\in\mathbb{S}^{n}$, derive the inequality:

$$\lambda_{\min}(X) \leq \dfrac{x^{\top}Xx}{x^{\top}x} \leq \lambda_{\max}(X).$$

(Hint: start from the spectral theorem applied to a symmetric matrix.)

(b) [10 points] Variational characterization of $\lambda_{\max}$ and $\lambda_{\min}$¶

Use part (a), to prove the following two formula:

$$\lambda_{\max}(X) = \underset{\|x\|_{2}=1}{\sup} x^{\top}X x, \qquad \lambda_{\min}(X) = \underset{\|x\|_{2}=1}{\inf} x^{\top}X x, \qquad x\in\mathbb{R}^{n}, \quad X\in\mathbb{S}^{n}.$$

(Hint: think if and when the equalities are achieved in the weak inequality derived in part (a).)

(c) [10 points] Function convexity¶

Use part (b) to argue which of the two functions $f(X) := \lambda_{\max}(X)$ and $g(X) = \lambda_{\min}(X)$ is/are convex over $\textbf{dom}(f) = \textbf{dom}(g) = \mathbb{S}^{n}$. Explain.

(Hint: think operations preserving function convexity/concavity from Lec. 8.)

Problem 2 [30 points] Support function revisited¶

Consider the support function $h(y)$ of any compact convex set introduced in HW3, Prob. 1(b).

(a) [1 + 9 = 10 points] Function convexity¶

Is the support function convex or concave or neither? Give reasons to explain your answer.

(Hint: again think operations preserving function convexity/concavity from Lec. 8.)

(b) [(1 + 7) + 2 = 10 points] Convex conjugate¶

(i) What is the function $f(x)$ whose convex conjugate equals the support function $h(y)$? Give reasons to explain your answer.

(ii) What is the geometric interpretation of your result in 2(b)(i)?

(c) [(1 + 4) + (1 + 4) = 10 points] Epigraph¶

(i) Is the epigraph of support function: $\text{epi}(h)$ a cone? Why/why not?

(ii) Is $\text{epi}(h)$ a convex cone? Why/why not?

Problem 3 [40 points] Bregman divergence¶

Given a strictly convex differentiable function $\psi$ over a closed convex set $\mathcal{X}$, the associated Bregman divergence $D_{\psi}:\mathcal{X}\times\mathcal{X}\mapsto\mathbb{R}_{\geq 0}$ is defined as

$$D_{\psi}(x,y) := \psi(x) - \psi(y) - \langle\nabla\psi(y), x-y\rangle, \qquad\forall x,y\in\mathcal{X}.$$

The above formula has a clean geometric meaning: the Bregman divergence quantifies the amount by which a strictly convex function lies above its linear approximation (tangent hyperplane).

For example, when $\psi(x) = \|x\|_{2}^{2}$, then for any closed $\mathcal{X}\subset\mathbb{R}^{n}$, we have $D_{\psi}(x,y) = \|x-y\|_{2}^{2}$, the squared Euclidean distance. (It is a good idea to verify this yourself!)

As another example, when $\psi(x) = \sum_{i=1}^{n}x_{i}\log x_{i}$, $\mathcal{X}\equiv \{x\in\mathbb{R}^{n}_{\geq 0}\mid \sum_{i=1}^{n}x_{i}=1\}$ (the standard simplex, see Lec. 5, p. 15), then $D_{\psi}(x,y) = \sum_{i=1}^{n}x_{i}\log(x_{i}/y_{i})$, the Kullback-Leibler divergence/relative entropy. (Again, good idea to verify this yourself!)

Even though $D_{\psi}(x,y)\neq D_{\psi}(y,x)$ in general, the Bregman divergence has many nice properties, and appears frequently in the machine learning applications.

(a) [(1 + 4) + (2 + 3) + 5 = 15 points] Function convexity¶

(i) Is $D_{\psi}(x,y)$ strictly convex in $x$? Why/why not?

(ii) If $\psi$ is $m$-strongly convex, then prove that $D_{\psi}(x,y) \geq \frac{m}{2}\|x-y\|_{2}^{2}$, and that $D_{\psi}(x,y)$ is $m$-strongly convex in $x$.

(iii) Construct a counterexample to demonstrate that $D_{\psi}(x,y)$ need not be convex in its second argument $y$.

(b) [5 + 5 + 15 = 25 points] Distance-like function¶

The Bregman divergence $D_{\psi}(x,y)$ behaves like the square of a generalized distance between elements in $\mathcal{X}$.

(i) Prove that $$\nabla_{x}D_{\psi}(x,y) = \nabla_{x}\psi(x) - \nabla_{y}\psi(y),$$

where the subscript of the gradient operator $\nabla$ shows the gradient is taken with respect to which vector.

(ii) Prove the generalized law of cosines:

$$D_{\psi}(x,y) = D_{\psi}(x,z) + D_{\psi}(z,y) - \langle x-z, \nabla\psi(y) - \nabla\psi(z)\rangle, \qquad \forall x,y,z\in\mathcal{X}.$$

Remark: Notice that for $\psi(x) = \|x\|_{2}^2$, this reduces to $\|x-y\|_{2}^{2} = \|x-z\|_{2}^{2} + \|z-y\|_{2}^{2} - 2\underbrace{\langle x-z, y-z\rangle}_{\|x-z\|_{2}\|y-z\|_{2}\cos\angle yzx}$.

(iii) Recall that $\mathcal{X}\equiv\textbf{dom}(\psi)$. Consider a closed convex set $\mathcal{C}$ such that $\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)\neq\emptyset$. For notational convenience, introduce $\mathcal{A}:=\mathcal{C}\cap\text{interior}\left(\mathcal{X}\right)$, which is a convex set.

Define the Bregman projection of $y$ onto $\mathcal{A}$ with respect to $\psi$ as

$${\rm{proj}}_{\mathcal{A}}\left(y\right) := \underset{a\in\mathcal{A}}{\arg\min} \,D_{\psi}\left(a, y\right). \qquad $$

Using parts (i)-(ii), prove the genralized Pythagorean theorem: $$D_{\psi}(x,y) \geq D_{\psi}(x, {\rm{proj}}_{\mathcal{A}}\left(y\right)) \: + \: D_{\psi}( {\rm{proj}}_{\mathcal{A}}\left(y\right),y).$$

Hint: If $a^{\text{opt}}$ is a local minimizer of the (possibly nonconvex) function $f(a)$ over $\mathcal{A}$, then the directional derivative $$\bigg\langle a - a^{\text{opt}}, \nabla_{a} f(a)\big\vert_{a=a^{\text{opt}}}\bigg\rangle \geq 0, \qquad \forall a\in\mathcal{A}.$$ If $f$ is convex over $\mathcal{A}$, then the above inequality is also sufficient to guarantee that $a^{\text{opt}}$ is a minimizer.

Remark: In the generalized Pythagorean theorem, the equality is achieved when the set $\mathcal{A}$ is affine.