of which feature coefficient $\vec{w}=[w_0, w_1, w_2, \dots ,w_{M-1}]^T$, feature $\phi(x_n)=[1, x_n, x_n^2, \dots, x_n^{M-1}]$ (here we add a bias term $\phi_0(x_n)=1$ to features).
regression_example_draw(degree1=0,degree2=1,degree3=3, ifprint=True)
The expression for the first polynomial is y=-0.129 The expression for the second polynomial is y=-0.231+0.598x^1 The expression for the third polynomial is y=0.085-0.781x^1+1.584x^2-0.097x^3
basis_function_plot()
of which $\phi(\vec{x}_n) = [\phi_0(\vec{x}_n),\phi_1(\vec{x}_n),\phi_2(\vec{x}_n), \dots, \phi_{M-1}(\vec{x}_n)]^T$
Main Idea: Instead of computing batch gradient (over entire training data), just compute gradient for individual training sample and update.
Main Idea: Compute gradient and set to gradient to zero, solving in closed form.
Here $\Phi \in \R^{N \times M}$ is called design matrix. Each row represents one sample. Each column represents one feature $$\Phi = \begin{bmatrix} \phi(\vec{x}_1)^T\\ \phi(\vec{x}_2)^T\\ \vdots\\ \phi(\vec{x}_N)^T \end{bmatrix} = \begin{bmatrix} \phi_0(\vec{x}_1) & \phi_1(\vec{x}_1) & \cdots & \phi_{M-1}(\vec{x}_1) \\ \phi_0(\vec{x}_2) & \phi_1(\vec{x}_2) & \cdots & \phi_{M-1}(\vec{x}_2) \\ \vdots & \vdots & \ddots & \vdots \\ \phi_0(\vec{x}_N) & \phi_1(\vec{x}_N) & \cdots & \phi_{M-1}(\vec{x}_N) \\ \end{bmatrix} $$
of which $\Phi^\dagger$ is called the Moore-Penrose Pseudoinverse of $\Phi$.
of which $(\Phi^T\Phi)^\dagger\Phi^T = \Phi^\dagger$. This is left as an exercise. (Hint: use SVD)