Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$ $\newcommand{\eqdef}{\equiv}$
This tour studies linear regression method in conjunction with regularization. It contrasts ridge regression and the Lasso.
We recommend that after doing this Numerical Tours, you apply it to your own data, for instance using a dataset from <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ LibSVM>.
Disclaimer: these machine learning tours are intended to be overly-simplistic implementations and applications of baseline machine learning methods. For more advanced uses and implementations, we recommend to use a state-of-the-art library, the most well known being <http://scikit-learn.org/ Scikit-Learn>
library(plot3D)
library(pracma)
library(rmatio)
options(warn=-1) # turns off warnings, to turn on: "options(warn=0)"
# Importing the libraries
for (f in list.files(path="nt_toolbox/toolbox_general/", pattern="*.R")) {
source(paste("nt_toolbox/toolbox_general/", f, sep=""))
}
for (f in list.files(path="nt_toolbox/toolbox_signal/", pattern="*.R")) {
source(paste("nt_toolbox/toolbox_signal/", f, sep=""))
}
Helpers:
Xm = function(X){as.matrix(X - rep(colMeans(X), rep.int(nrow(X), ncol(X))))}
Cov = function(X){data.matrix(1. / (n - 1) * t(Xm(X)) %*% Xm(X))}
We test the method on the prostate dataset in $n=97$ samples with features $x_i \in \RR^p$ in dimension $p=8$. The goal is to predict the price value $y_i \in \RR$.
Helpers.
Xm = function(X){as.matrix(X - rep(colMeans(X), rep.int(nrow(X), ncol(X))))}
Cov = function(X){data.matrix(1. / (n - 1) * t(Xm(X)) %*% Xm(X))}
Load the dataset.
mat = read.mat("nt_toolbox/data/ml-prostate.mat")
A = mat$A
class_names = mat$class_names[[1]]
head(A)
-0.5798185 | 2.769459 | 50 | -1.386294 | 0 | -1.386294 | 6 | 0 | -0.4307829 | 1 |
-0.9942523 | 3.319626 | 58 | -1.386294 | 0 | -1.386294 | 6 | 0 | -0.1625189 | 1 |
-0.5108256 | 2.691243 | 74 | -1.386294 | 0 | -1.386294 | 7 | 20 | -0.1625189 | 1 |
-1.2039728 | 3.282789 | 58 | -1.386294 | 0 | -1.386294 | 6 | 0 | -0.1625189 | 1 |
0.7514161 | 3.432373 | 62 | -1.386294 | 0 | -1.386294 | 6 | 0 | 0.3715636 | 1 |
-1.0498221 | 3.228826 | 50 | -1.386294 | 0 | -1.386294 | 6 | 0 | 0.7654678 | 1 |
Randomly permute it.
A = A[sample(dim(A)[1]),]
Separate the features $X$ from the data $y$ to predict information.
X = A[,1:(dim(A)[2] - 2)]
y = A[,dim(A)[2] - 1]
c = A[,dim(A)[2]]
$n$ is the number of samples, $p$ is the dimensionality of the features,
n = dim(X)[1]
p = dim(X)[2]
print(n)
print(p)
[1] 97 [1] 8
Split into training and testing.
I0 = (c==1) # train
I1 = (c==0) # test
n0 = sum(I0)
n1 = n - n0
X0 = X[I0,]
y0 = y[I0]
X1 = X[I1,]
y1 = y[I1]
Normalize the features by the mean and std of the training set. This is optional.
mX0 = apply(X0, 2, mean)
sX0 = apply(X0, 2, std)
X0 = sweep(X0, 2, mX0,"-")
X0 = sweep(X0, 2, sX0,"/")
X1 = sweep(X1, 2, mX0,"-")
X1 = sweep(X1, 2, sX0,"/")
Remove the mean (computed from the test set) to avoid introducing a bias term and a constant regressor. This is optional.
m0 = mean(y0)
y0 = y0 - m0
y1 = y1 - m0
In order to display in 2-D or 3-D the data, dimensionality is needed. The simplest method is the principal component analysis, which perform an orthogonal linear projection on the principal axsis (eigenvector) of the covariance matrix.
Display the covariance matrix of the training set.
options(repr.plot.width=4, repr.plot.height=4)
C = t(X0) %*% X0
image(1:p, 1:p, C, xlab="", ylab="", col=topo.colors(5), ylim=c(p + 0.5, 0.5))
options(repr.plot.width=4, repr.plot.height=4)
barplot(t(t(X0) %*% y0), xlab='', ylab='', col="darkblue")
Compute PCA ortho-basis and the feature in the PCA basis.
svd_decomp = svd(X0)
U = svd_decomp$u
s = svd_decomp$d
V = svd_decomp$v
X0r = X0 %*% V
Plot sqrt of the eigenvalues.
options(repr.plot.width=4, repr.plot.height=4)
plot(s, type="o", col=4, ylab="", xlab="", pch=16)
Display the features.
options(repr.plot.width=6, repr.plot.height=6)
panel.hist = function(x, ...)
{
usr = par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h = hist(x, plot = FALSE)
breaks = h$breaks; nB <- length(breaks)
y = h$counts
y = y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "darkblue")
}
pairs(X0, col="blue", diag.panel=panel.hist)
Display the points cloud of feature vectors in 2-D PCA space.
options(repr.plot.width=4, repr.plot.height=4)
plot(X0r[,1], X0r[,2], col="blue", pch=16, xlab="", ylab="")
1D plot of the function to regress along the main eigenvector axes.
options(repr.plot.width=5, repr.plot.height=3)
for (i in 1:p)
{
plot(X0r[,i], y0, col=i+1, xlab="", ylab="", pch=16)
}