Problem 1 [20 points] A set of matrices

Consider the following set of matrices:

$$\mathcal{X} := \bigg\{X\in\mathbb{R}^{n\times n} \mid X_{ij}\geq 0, \quad \displaystyle\sum_{i=1}^{n}X_{ij} = \displaystyle\sum_{j=1}^{n}X_{ij} = 1, \quad \text{for all}\quad i,j=1, ...,n\bigg\}.$$

(a) [10 points] Convexity

Prove that $\mathcal{X}$ is a convex set.

Solution for 1(a):

Consider two matrices $X,Y\in\mathcal{X}$. Then for all $0\leq \theta \leq 1$, the convex combination $\theta X + (1-\theta)Y$ satisfies

$$\theta X_{ij} + (1-\theta)Y_{ij} \geq 0, \quad \sum_{i=1}^{n} \theta X_{i j}+(1-\theta) Y_{i j}=\sum_{j=1}^{n} \theta X_{i j}+(1-\theta) Y_{i j}=\theta + (1-\theta) = 1, \quad\text{for all}\quad i,j=1,...,n,$$

meaning the matrix $\theta X + (1-\theta)Y \in \mathcal{X}$. Therefore, $\mathcal{X}$ is a convex set.

(b) [(1+4) + (1+4) = 10 points] What kind of convex set

(i) Is $\mathcal{X}$ a cone? Why/why not?

(ii) Is $\mathcal{X}$ a polyhedron? Why/why not?

Solution for 1(b):

(i) No, it is not a cone.

For any $X\in\mathcal{X}$ and any $\alpha\geq0$, the scaled matrix $\alpha X$ satisfies $\alpha X_{ij} \geq 0$ for all $i,j=1,...,n$. But $\sum_{i=1}^{n}\alpha X_{ij} = \sum_{j=1}^{n}\alpha X_{ij} = \alpha$, which is an arbitray nonnegative number (instead of unity). Therefore, $\mathcal{X}$ is not a cone.

(ii) Yes, it is a polyhedron.

This is because $\mathcal{X}$ is defined as the collection of linear inequalities and equalities, that is as an intersection of a finite number of halfspaces and hyperplanes (see Lec. 5, p. 13). Geometrically, this convex polyhedorn $\mathcal{X}$ is a subset of the $(n^2 - 2n + 1)$ dimensional affine supspace in $\mathbb{R}^{n^2}$, defined through the intersection of $n^2$ halfspaces and $2n$ hyperplanes.

The set $\mathcal{X}$ has a name: Birkhoff polytope. A matrix $X\in\mathcal{X}$ is called a doubly stochastic matrix.

Problem 2 [50 points] Hulls

(a) [1+4 = 5 points] Affine hull

What is ${\rm{aff}}\left(\mathbb{S}^{n}_{+}\right)$, the affine hull of the set of $n\times n$ positive semidefinite matrices? Give reasons to support your answer.

Solution for 2(a):

By definition of affine hull (Lec. 4, p. 14-16), for any positive integer $k\geq 2$, we have

$$\text{aff}\left(\mathbb{S}^{n}_{+}\right) = \bigg\{\sum_{i=1}^{k}\theta_{i}X_{i} \mid X_{1}, ..., X_{k}\in\mathbb{S}^{n}_{+}, \quad (\theta_1, ..., \theta_k)\in\mathbb{R}^{k}, \quad \theta_{1} + ... + \theta_{k}=1\bigg\}.$$

Clearly, $\left(\sum_{i=1}^{k}\theta_{i}X_{i}\right)^{\top} = \sum_{i=1}^{k}\theta_{i}X_{i}$, and thus $\sum_{i=1}^{k}\theta_{i}X_{i} \in \mathbb{S}^{n}$. However, allowing $(\theta_1, ..., \theta_k)\in\mathbb{R}^{k}$ can no longer guranatee the sign-definiteness of the matrix $\sum_{i=1}^{k}\theta_{i}X_{i}$. To show that this set is actually the entire $\mathbb{S}^{n}$, and not a proper subset of $\mathbb{S}^{n}$, we need to demonstrate that any symmetric matrix can be written as linear combination of positive semidefinite matrices.

We know from spectral theorem that for any $X\in\mathbb{S}^{n}$, we have $X = V D V^{\top}$ with orthogonal $V$, and diagonal $D$ comprising of real eigenvalues. Now write $D = D^{+} - D^{-}$, where the diagonal matrix $D^{+}$ collects the positive part of the eigenvalues, and the digonal matrix $D^{-}$ collects the negative part of the eigenvalues. Zero eigenvalues, if any, recur in the corresponding diagonal entries in both $D^{+}$ and $D^{-}$. Clearly, $D^{+},D^{-}\in\mathbb{S}_{+}^{n}$.

Then, we can set $\theta_1 = 1$, $\theta_{2} = -1$, $X_{1} = VD^{+}V^{\top}$, $X_{2} = VD^{-}V^{\top}$, and we have $X = \theta_{1}X_{1} + \theta_{2}X_{2} + (1-(\theta_{1}+\theta_2))0_{n\times n}$. One can increase the number of summands in the linear combination by choosing the remaining coefficients to be zero and the corresponding positive semidefinite matrices arbitrary. Other ways to cancel the remaining terms are possible too.

Thus, any $X\in\mathbb{S}^{n}$ can be written as linear combination of positive semidefinite matrices. We conclude: $\text{aff}\left(\mathbb{S}^{n}_{+}\right) = \mathbb{S}^{n}$.

(b) [10 + 10 + 10 = 30 points] Properties of convex hull

Consider nonempty sets $\mathcal{A},\mathcal{B},\mathcal{X},\mathcal{Y} \subseteq \mathbb{R}^{n}$.

(i) Prove that if $\mathcal{X}\subseteq \mathcal{Y}$, then ${\rm{conv}}\left(\mathcal{X}\right) \subseteq {\rm{conv}}\left(\mathcal{Y}\right)$.

(ii) Use (i) to prove that ${\rm{conv}}\left(\mathcal{A}\cap\mathcal{B}\right) \subseteq {\rm{conv}}\left(\mathcal{A}\right) \cap {\rm{conv}}\left(\mathcal{B}\right)$.

(iii) Use (i) to prove that ${\rm{conv}}\left(\mathcal{A}\cup\mathcal{B}\right) = {\rm{conv}}\left({\rm{conv}}\left(\mathcal{A}\right) \cup {\rm{conv}}\left(\mathcal{B}\right)\right)$.

Solution for 2(b):

(i) Since $\mathcal{Y} \subseteq \text{conv}\left(\mathcal{Y}\right)$, we immediately have $\mathcal{X}\subseteq\mathcal{Y}\subseteq\text{conv}\left(\mathcal{Y}\right)$.

But $\text{conv}\left(\mathcal{X}\right)$ is the smallest convex set containing $\mathcal{X}$. Therefore, any convex set that contains $\mathcal{X}$, must also contain $\text{conv}\left(\mathcal{X}\right)$. Thus, $\text{conv}\left(\mathcal{X}\right)\subseteq\text{conv}\left(\mathcal{Y}\right)$.

(ii) Notice that $$\mathcal{A}\cap\mathcal{B} \subseteq \mathcal{A}\:\stackrel{\text{part (i)}}{\Rightarrow}\:\text{conv}\left(\mathcal{A}\cap\mathcal{B}\right)\subseteq\text{conv}\left(\mathcal{A}\right), \quad \mathcal{A}\cap\mathcal{B} \subseteq \mathcal{B}\:\stackrel{\text{part (i)}}{\Rightarrow}\:\text{conv}\left(\mathcal{A}\cap\mathcal{B}\right)\subseteq\text{conv}\left(\mathcal{B}\right).$$

Combining the above two, we deduce $$\text{conv}\left(\mathcal{A}\cap\mathcal{B}\right)\subseteq\text{conv}\left(\mathcal{A}\right)\cap\text{conv}\left(\mathcal{B}\right).$$

(iii) Since $$\mathcal{A} \subseteq \text{conv}(\mathcal{A}),\quad \mathcal{B} \subseteq \text{conv}(\mathcal{B}),$$

therefore, $$\mathcal{A} \cup \mathcal{B} \subseteq \text{conv}(\mathcal{A}) \cup \text{conv}(\mathcal{B})\: \stackrel{\text{part (i)}}{\Rightarrow} \: \boxed{\text{conv}(\mathcal{A} \cup \mathcal{B}) \subseteq \text{conv}\left(\text{conv}(\mathcal{A}) \cup \text{conv}(\mathcal{B})\right)}.$$

On the other hand, $$\mathcal{A} \subseteq \mathcal{A} \cup \mathcal{B} \: \stackrel{\text{part (i)}}{\Rightarrow} \: \text{conv}(\mathcal{A}) \subseteq \text{conv}(\mathcal{A} \cup \mathcal{B}), \qquad \mathcal{B} \subseteq \mathcal{A} \cup \mathcal{B} \: \stackrel{\text{part (i)}}{\Rightarrow} \: \text{conv}(\mathcal{B}) \subseteq \text{conv}(\mathcal{A} \cup \mathcal{B}),$$ which together yield $$ \text{conv}(\mathcal{A}) \cup \text{conv}(\mathcal{B}) \subseteq \text{conv}(\mathcal{A} \cup \mathcal{B}) \: \stackrel{\text{part (i)}}{\Rightarrow} \: \boxed{\text{conv}(\text{conv}(\mathcal{A}) \cup \text{conv}(\mathcal{B})) \subseteq \text{conv}(\text{conv}(\mathcal{A} \cup \mathcal{B})) = \text{conv}(\mathcal{A} \cup \mathcal{B})}.$$

Combining the two boxed equations, we arrive at the statement.

(c) [12 + (1+2) = 15 points] Convex hull of roots

Consider a nonconstant univariate polynomial $p\in\mathbb{R}[x]$ (this notation means that the polynomial $p$ is in variable $x$, and it has real coefficients).

Suppose $\mathcal{R}=\{r_1,r_2,...\}$ is the set of all roots of the polynomial $p$. In general, some roots are real, and some complex conjugates.

(i) Prove that all critical points (roots of the derivative) of $p$ must lie in ${\rm{conv}}(\mathcal{R})$, the convex hull of $\mathcal{R}$.

(To help you visualize the statement above, in the following figure, we plotted the roots (blue dots) of a randomly generated 10th degree polynomial, and its critical points (red dots) in the complex plane. In this plot, ${\rm{conv}}(\mathcal{R})$ is the polygon with sides depicted as the blue solid lines.)

(ii) Will the statement in (c)(i) change if $p\in\mathbb{C}[x]$, that is, if the polynomial has complex coefficients? Why/why not?

Solution for 2(c):

(i) Let us write the $n$th degree polynomial $p\in\mathbb{R}[x]$ in terms of its roots: $p(x)=a(x-r_{1})(x-r_{2})...(x-r_{n})$, where $a$ is the coefficient of $x^{n}$. Taking natural log and then derivative w.r.t. $x$, we get

$$p^{\prime}(x) = p(x) \left(\frac{1}{x-r_1} + ... + \frac{1}{x-r_n}\right).$$

Therefore, any critical point $c$ satisfying $p^{\prime}(c)=0$ must satisfy either $p(c)=0$, or $$q(c):=\frac{1}{c-r_1} + ... + \frac{1}{c-r_n} = 0.$$

If $p(c)=0$, then we're done. If $q(c)=0$, then multiplying the numerator and denominator of each summand of $q(c)$ by the complex conjugate of its denominator, we get $$\frac{\overline{c} - \overline{r_1}}{|c-r_1|^2} + ... + \frac{\overline{c} - \overline{r_n}}{|c-r_n|^2} = 0, \qquad \text{(bar denotes complex conjugates)}.$$

This gives $$\overline{c}\left(\frac{1}{|c-r_1|^2} + ... + \frac{1}{|c-r_n|^2}\right) = \frac{\overline{r_1}}{|c-r_1|^2} + ... + \frac{\overline{r_n}}{|c-r_n|^2}.$$

Taking conjugates to both sides of the above, we obtain

$$c = \left(\frac{1}{|c-r_1|^2} + ... + \frac{1}{|c-r_n|^2}\right)^{-1}\left(\frac{r_1}{|c-r_1|^2} + ... + \frac{r_n}{|c-r_n|^2}\right),$$

that is, $c$ is a convex combination of the roots $r_{i}$, completing the proof.

(ii) No, the statement will not change. The reason: thanks to the fundamental theorem of algebra, all the calculations in part (i) remain valid.

Problem 3 [30 points] Vector unit norm balls revisited

In this exercise, we take a closer look at the vector unit norm balls encountered in HW1, Prob. 2.

For $i=1,...,n$, we use the symbol $e_{i}$ to denote the $i$th standard basis vector in $\mathbb{R}^{n}$. You can simply think of $e_{i}$ as the $i$th column of the $n\times n$ identity matrix.

(a) [10 points] $p$ norm ball is a convex set for any $p\geq 1$

From the plots in HW1, Prob. 2, we intuitively expect the unit $p$ norm balls to be convex sets for $p\geq 1$, and nonconvex sets for $0\leq p<1.$ This is indeed true.

Use the Minkowski inequality (which can be thought of as the triangle inequality for $p$-norms whenever $p\geq 1$):

$$\|u+v\|_{p} \leq \|u\|_{p} + \|v\|_{p}, \quad \forall u,v\in\mathbb{R}^{n}, \quad p\geq 1,$$

to prove that the vector unit norm balls are convex sets for any $p\geq 1$.

Solution for 3(a):

Consider two different $y,z\in\mathcal{X}:=\{x\in\mathbb{R}^{n}\mid \|x\|_{p}\leq 1, \; p\geq 1\}$.

For any $0\leq \theta \leq 1$, by the Minkowski inequality, we have

$$\|\theta y + (1-\theta)z\|_{p} \leq \theta\|y\|_{p} + (1-\theta)\|z\|_{p} \leq \theta + (1-\theta) = 1.$$

Therefore, the vector $\theta y + (1-\theta)z \in \mathcal{X}$ for any $0\leq \theta \leq 1$. Hence $\mathcal{X}$ is a convex set.

(b) [5 + 5 = 10 points] $1$ norm ball is a polyhedron

Let us establish in two different ways that the $1$ norm ball $\{x\in\mathbb{R}^{n}\mid \|x\|_{1} \leq 1\}$ is a polyhedron.

(i) Write the $1$ norm ball as the intersection of $2n+1$ halfspaces in $\mathbb{R}^{2n}$. Hint: introduce $n$ additional variables.

(From Lec. 5, p. 13-14, this allows us to conclude that the $1$ norm ball is a polyhedron.)

(ii) Write the $1$ norm ball as the convex hull of $2n$ points in $\mathbb{R}^{n}$ in terms of the vectors $e_i$.

(From Lec. 6, p. 1, this allows us to conclude that the $1$ norm ball is a polyhedron.)

Solution for 3(b):

(i) To rewrite the 1 norm ball $\{x\in\mathbb{R}^{n}\mid |x_1| + ... + |x_{n}| \leq 1\}$, we introduce $n$ additional variables $y_{i}$, $i=1,...,n$ (or equivalently additional vector variable $y\in\mathbb{R}^{n}$) such that $|x_{i}| \leq y_{i}$.

Then, we have $$-y_{i} \leq x_{i} \leq y_{i}, \quad \sum_{i=1}^{n}y_{i} \leq 1,$$ which is a system of $2n+1$ linear inequalities in $2n$ variables, i.e., intersection of $2n+1$ halfspaces in $\mathbb{R}^{2n}$.

From the definition of polyhedron (Lec. 5, p. 13), this allows us to conclude that the 1 norm ball is a polyhedron.

(ii) We can write the 1 norm ball as $\text{conv}\left(e_{1},-e_{1}, ..., e_{n},-e_{n}\right)$, i.e., as the convex hull of $2n$ points in $\mathbb{R}^{n}$.

Remark: The 1 norm ball is also known as the cross polytope, or the orthoplex.

(c) [5 + 5 = 10 points] $\infty$ norm ball is a polyhedron

Let us establish in two different ways that the $\infty$ norm ball $\{x\in\mathbb{R}^{n}\mid \|x\|_{\infty} \leq 1\}$ is a polyhedron.

(i) Write the $\infty$ norm ball as the intersection of $2n$ halfspaces in $\mathbb{R}^{n}$ in terms of the vectors $e_{i}$.

(From Lec. 5, p. 13-14, this allows us to conclude that the $\infty$ norm ball is a polyhedron.)

(ii) Write the $\infty$ norm ball as the convex hull of $2^{n}$ points in $\mathbb{R}^{n}$.

(From Lec. 6, p. 1, this allows us to conclude that the $\infty$ norm ball is a polyhedron.)

Solution for 3(c):

(i) To rewrite the $\infty$ norm ball $\{x\in\mathbb{R}^{n}\mid \underset{i=1,...,n}{\max} |x_{i}| \leq 1\}$, notice that $\underset{i=1,...,n}{\max} |x_{i}| \leq 1$ if and only if $|x_i|\leq 1$ for all $i=1,...,n$. This is same as saying

$$-1 \leq x_{i} \leq +1, \quad \forall\;i=1,...,n.$$

Thus, the $\infty$ norm ball is the intersection of $2n$ linear inequalities/halfspaces in $\mathbb{R}^{n}$, and by definition (Lec. 5, p. 13), is a polyhedron.

(ii) We can write the $\infty$ norm ball as $\text{conv}\left(\{-1,1\}^{n}\right)$ where $\{-1,1\}^{n}$ denotes the collection of $n\times 1$ vectors whose each element is either $+1$ or $-1$. The plus-minus permutations admit $2^{n}$ such vectors, and thus the $\infty$ norm ball is the convex hull of these $2^{n}$ points in $\mathbb{R}^{n}$.

Remark: The $\infty$ norm ball is also known as the hypercube.