\
Suppose without loss of generality that $\mathbf{A}$ has an eigenvalue very small in absolute value. Take shift $\sigma=0$. Suppose $\mathbf{v}$ is a vector with components in the directions of all the eigenvectors $\mathbf{u}_1, \dotsc, \mathbf{u}_n$ of $\mathbf{A}$, associated with eigenvalues $|\lambda_1| \ll |\lambda_2| \le \dotsc \le |\lambda_n|$. Also suppose that linear system $\mathbf{A}\mathbf{x} = \mathbf{v}$ is solved backward stably, yielding a computed solution $\tilde{\mathbf{x}}$.
Show that although $\tilde{\mathbf{x}}$ may be far from $\mathbf{x} = \mathbf{A}^{-1}\mathbf{v}$, $\tilde{\mathbf{x}}/\Vert \tilde{\mathbf{x}} \Vert$ will not be far from $\mathbf{x} / \Vert \mathbf{x} \Vert$.
(Hint. Using the matrix calculus rule, $d(\mathbf{A}^{-1}) = -\mathbf{A}^{-1}(d\mathbf{A})\mathbf{A}^{-1}$, or
$$
(\mathbf{A} + \delta\mathbf{A})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}(\delta\mathbf{A})\mathbf{A}^{-1} + O(\Vert\delta\mathbf{A}\Vert^2)
,
$$
where $\delta\mathbf{A}$ is the roundoff error.)
This problem is about automatic recognition of handwritten digits. The dataset to be used is the famous MNIST dataset available at
https://www.kaggle.com/oddrationale/mnist-in-csv
The dataset mnist_train.csv
contains $n = 60, 000$ images for training. An image is a 28 * 28 gray-level array. Each image is stored as a line of the file mnist_test.csv
, in the format
label, pix(1, 1), pix(1, 2), pix(1, 3), ..., pix(28, 28)
where label
is a label corresponding to the digit represented by the image (0, 1, ..., 9), and pix(r, c)
is the pixel intensity at row r
, column c
(an integer between 0 and 255 (inclusive)).
The $i$-th image in a dataset is considered a vector $\mathbf{x}_i \in \mathbb{R}^p$, $p = 28\times 28 = 784$.
after standardization.
Let $V \in \mathbb{R}^{d\times r}$ be the matrix whose columns are given by the eigenvectors of $\Sigma$ associated with the $r = 10$ largest eigenvalues.
Plot the $r$ principal components as 28 * 28 pixels images. Scale them as to make them visible.
mnist_test.csv
that contains $n=10,000$ images for testing. Plotthe loadings of each image $\mathbf{x}_i$ in terms of the principal components $\mathbf{v}_1,\dotsc,\mathbf{v}_r$. Discuss the possibility of the use of the loadings for digit recognition.
Perron-Frobenius theory states that if a matrix $\mathbf{A}$ has positive entries, then there exists a positive dominant eigenvalue-eigenvector pair, i.e., $\mathbf{A}\mathbf{v}_0 = \lambda_0\mathbf{v}_0$ for some $\lambda_0 > 0$ and $\mathbf{v}_0 > 0$, and if $\lambda\neq\lambda_0$ is an eigenvalue of $\mathbf{A}$, then $|\lambda| < \lambda_0$. Also, the geometric and algebraic multiplicity of $\lambda_0$ is 1. That is, $\text{dim}(\text{span}\{\mathbf{v}: \mathbf{A}\mathbf{v}=\lambda_0\mathbf{v}\})=1$, and $\lambda_0$ is the unique root of $\text{det}(\mathbf{A}-\lambda\mathbf{I})=0$.
In class, we defined the PageRank problem as an eigenvalue problem associated with the transition matrix $\mathbf{P}$ of a Markov chain. We assume the random jump probability $p > 0$ (the inequality is elementwise).
SNAP/web-Google
data in the MatrixDepot
package.UnicodePlots.jl
.Lanczos/Arnoldi iterative method for top eigen-pairs
section in the lecture notes on EVD.)
For iterative methods, use the IterativeSolvers.jl
and Arpack.jl
packages. Make sure to utilize the special structure of $\mathbf{P}$ to speed up the matrix-vector multiplication. (Hint: LinearMaps.jl
)
For an extended real-valued function $f: \mathbb{R}^n \to \mathbb{R} \cup \{\infty\}$, its Fenchel conjugate is defined as $$ f^*(\mathbf{y}) := \sup_{\mathbf{x}\in\mathbb{R}^n}\{\mathbf{x}^T\mathbf{y} - f(\mathbf{x})\} . $$
(Hint. Denote $\mathbf{A}^{-1}\mathbf{b}$ by $\mathbf{x}$. Show that there is a rank-1 matrix $\delta \mathbf{A} \in \mathbb{R}^{n\times n}$ such that $$ \Vert\mathbf{A}^{-1}(\delta \mathbf{A})\mathbf{x} \Vert = \Vert\mathbf{A}^{-1}\Vert\Vert\delta\mathbf{A}\Vert\Vert\mathbf{x}\Vert . $$ Use part 6.)
We want to solve an LP $$ \begin{array}{ll} \text{minimize} & \mathbf{c}^T\mathbf{x} \\ \text{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b} \\ & \mathbf{x} \ge \mathbf{0} \end{array} $$ with $$ \mathbf{A} = \begin{bmatrix} 2 & 0 & 0 & 1 & 0 & 0 \\ 0 & 2 & 0 & 0 & 1 & 0 \\ 0 & 0 & 2 & 0 & 0 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \quad \mathbf{c} = (-1, -1, -1, 0, 0, 0)^T. $$
linprog
R package (Your code should run on Julia).Convex.jl
with your favorite solver.The response variable is log of prostate-specific antigen (lpsa
). The predictors are log cancer volume (lcavol
), log prostate weight (weight
), age
, log of amount of benign prostatic hyperplasia (lbph
), seminal vesicle invasion (svi
), log of capsular penetration (lcp
), Gleason score (gleason
), and percent of Gleason scores 4 or 5 (pgg45
).
More information can be found in https://web.stanford.edu/~hastie/ElemStatLearn/datasets/prostate.info.txt; see also Section 3.2.1 of Elements of Statistical Learning by Hastie et al. (2009).
Standardize predictors to have mean 0 and variance 1 and add intercept.
Then split data into the training and test sets by 8 to 2. Fit the Dantig selector model for $t=0, 0.05t_{\max}, 0.1 t_{\max}, \dotsc, t_{\max}$ on the training data and plot the solution path. Also plot the prediction error on the test data over $t$.