%matplotlib inline
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
import pandas as pd
from datetime import datetime
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML
reload(dm)
reload(ut)
reload(sl)
print ''
%%html
<style>
.container.slides .celltoolbar, .container.slides .hide-in-slideshow {
display: None ! important;
}
</style>
%Set up useful MathJax (Latex) macros. %See http://docs.mathjax.org/en/latest/tex.html#defining-tex-macros %These are for use in the slideshow $\newcommand{\mat}[1]{\left[\begin{array}#1\end{array}\right]}$ $\newcommand{\vx}{{\mathbf x}}$ $\newcommand{\vy}{{\mathbf y}}$ $\newcommand{\vz}{{\mathbf z}}$ $\newcommand{\R}{{\mathbb{R}}}$ $\newcommand{\vu}{{\mathbf u}}$ $\newcommand{\vv}{{\mathbf v}}$ $\newcommand{\vw}{{\mathbf w}}$ $\newcommand{\col}{{\operatorname{Col}}}$ $\newcommand{\nul}{{\operatorname{Nul}}}$ $\newcommand{\vb}{{\mathbf b}}$ $\newcommand{\va}{{\mathbf a}}$ $\newcommand{\ve}{{\mathbf e}}$ $\newcommand{\setb}{{\mathcal{B}}}$ $\newcommand{\rank}{{\operatorname{rank}}}$ $\newcommand{\vp}{{\mathbf p}}$
Today we'll study the properties of sets of orthogonal vectors. These can be very useful.
A set of vectors $\{\vu_1,\dots,\vu_p\}$ in $\R^n$ is said to be an orthogonal set if each pair of distinct vectors from the set is orthogonal, i.e.,
$$\vu_i^T\vu_j = 0\;\;\mbox{whenever}\;i\neq j.$$Example. Show that $\{\vu_1,\vu_2,\vu_3\}$ is an orthogonal set, where
$$ \vu_1 = \mat{{c}3\\1\\1},\;\;\vu_2=\mat{{r}-1\\2\\1},\;\;\vu_3=\mat{{c}-1/2\\-2\\7/2}.$$Solution. Consider the three possible pairs of distinct vectors, namely, $\{\vu_1,\vu_2\}, \{\vu_1,\vu_3\},$ and $\{\vu_2,\vu_3\}.$
Each pair of distinct vectors is orthogonal, and so $\{\vu_1,\vu_2, \vu_3\}$ is an orthogonal set.
In three space they describe three lines that are mutually perpendicular.
# image credit: Lay 4th edition figure 3 in Ch 6.1
sl.hide_code_in_slideshow()
display(Image("images/Lay-fig-6-2-1.jpg", width=400))
Theorem. If $S = \{\vu_1,\dots,\vu_p\}$ is an orthogonal set of nonzero vectors in $\R^n,$ then $S$ is linearly independent and hence is a basis for the subspace spanned by $S$.
Proof. We will prove that there is no linear combination of the vectors in $S$ with nonzero coefficients that yields the zero vector.
Assume ${\bf 0} = c_1\vu_1 + \dots + c_p\vu_p$ for some scalars $c_1,\dots,c_p$. Then:
$${\bf 0} = c_1\vu_1 + c_2\vu_2 + \dots + c_p\vu_p$$Because $\vu_1$ is orthogonal to $\vu_2,\dots,\vu_p$:
$$ = c_1(\vu_1^T\vu_1) $$Since $\vu_1$ is nonzero, $\vu_1^T\vu_1$ is not zero and so $c_1 = 0$.
We can use the same kind of reasoning to show that, $c_2,\dots,c_p$ must be zero.
In other words, there is no nonzero combination of $\vu_i$'s that yields the zero vector --so $S$ is linearly independent.
Definition. An orthogonal basis for a subspace $W$ of $\R^n$ is a basis for $W$ that is also an orthogonal set.
We have seen that for any subspace, there may be many different sets of vectors that can serve as a basis for $W$.
However an orthogonal basis is a particularly nice basis, because the weights (coordinates) of any point can be computed easily.
Theorem. Let $\{\vu_1,\dots,\vu_p\}$ be an orthogonal basis for a subspace $W$ of $\R^n$. For each $\vy$ in $W,$ the weights of the linear combination
$$\vy = c_1\vu_1 + \dots + c_p\vu_p$$are given by
$$c_j = \frac{\vy^T\vu_j}{\vu_j^T\vu_j}\;\;\;j = 1,\dots,p$$Proof. As we saw in the last proof, the orthogonality of $\{\vu_1,\dots,\vu_p\}$ means that
$$\vy^T\vu_1 = (c_1\vu_1 + c_1\vu_2 + \dots + c_p\vu_p)^T\vu_1$$$$=c_1(\vu_1^T\vu_1)$$Since $\vu_1^T\vu_1$ is not zero, the equation above can be solved for $c_1.$ To find any other $c_j,$ compute $\vy^T\vu_j$ and solve for $c_j$.
Example. The set $S$ which we saw earlier, ie,
$$ \vu_1 = \mat{{c}3\\1\\1},\;\;\vu_2=\mat{{r}-1\\2\\1},\;\;\vu_3=\mat{{c}-1/2\\-2\\7/2},$$is an orthogonal basis for $\R^3.$
Then, express the vector $\vy = \mat{{r}6\\1\\-8}$ as a linear combination of the vectors in $S$ (ie, in the basis $S$ or in the coordinate system $S$).
Solution. Compute
$$\vy^T\vu_1 = 11,\;\;\;\vy^T\vu_2 = -12,\;\;\;\vy^T\vu_3 = -33,$$$$\vu_1^T\vu_1 = 11,\;\;\;\vu_2^T\vu_2 = 6,\;\;\;\vu_3^T\vu_3 = 33/2$$So
$$\vy = \frac{\vy^T\vu_1}{\vu_1^T\vu_1}\vu_1 + \frac{\vy^T\vu_2}{\vu_2^T\vu_2}\vu_2 + \frac{\vy^T\vu_3}{\vu_3^T\vu_3}\vu_3$$Let's stop for a moment and think about how we would have done this if we had not known that the vectors $\vu_1, \vu_2,$ and $\vu_3$ form an orthogonal set.
We would have been looking for
$$c_1 \vu_1 + c_2\vu_2 + c_3\vu_3 = \vy$$The way we would find $c_1, c_2, c_3$ in that case would be to solve the linear system
$$[\vu_1 \;\vu_2 \;\vu_3]\mat{{c}c_1\\c_2\\c_3} = \vy$$which would have been much more trouble than what we did.
Instead, because the basis is an orthogonal basis, each coefficient $c_1$ can be found separately, and simply.
Given a nonzero vector $\vu$ in $\R^n,$ consider the problem of decomposing a vector $\vy$ in $\R^n$ into the sum of two vectors:
In other words, we wish to write:
$$\vy = \hat{\vy} + \vz$$where $\hat{\vy} = \alpha\vu$ for some scalar $\alpha$ and $\vz$ is some vector orthogonal to $\vu.$
sl.hide_code_in_slideshow()
ax = ut.plotSetup(-1,12,-1,5,(1.2*6,1.2*4))
ut.centerAxes(ax)
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off') # labels along the bottom edge are off
plt.tick_params(
axis='y', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
left='off', # ticks along the bottom edge are off
right='off', # ticks along the top edge are off
labelleft='off') # labels along the bottom edge are off
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b--',lw=2)
plt.plot([0,pt[0]],[0,0],'r-',lw=3)
plt.plot([0,0],[0,pt[1]],'r-',lw=3)
ut.plotVec(ax,pt)
u = np.array([pt[0],0])
v = [0,pt[1]]
ut.plotVec(ax,u)
ut.plotVec(ax,2*u)
ut.plotVec(ax,v)
ax.text(pt[0],-0.75,r'${\bf \hat{y}}=\alpha{\bf u}$',size=20)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20)
ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z = y -\hat{y}}$',size=20)
print ''
That is, we are given $\vy$ and $\vu$, and asked to compute $\vz$ and $\hat{\vy}.$
To solve this, assume that we have some $\alpha$, and with it we compute $\vy - \alpha\vu = \vy-\hat{\vy} = \vz.$
We want $\vz$ to be orthogonal to $\vu.$
Now $\vz = \vy - \alpha{\vu}$ is orthogonal to $\vu$ if and only if
$$0 = (\vy - \alpha\vu)^T\vu$$That is, the solution in which $\vz$ is orthogonal to $\vu$ happens if and only if
$$\alpha = \frac{\vy^T\vu}{\vu^T\vu}$$and since $\hat{\vy} = \alpha\vu$,
$$\hat{\vy} = \frac{\vy^T\vu}{\vu^T\vu}\vu.$$The vector $\hat{\vy}$ is called the orthogonal projection of $\vy$ onto $\vu$, and the vector $\vz$ is called the component of $\vy$ orthogonal to $\vu.$
sl.hide_code_in_slideshow()
ax = ut.plotSetup(-1,12,-1,5,(1.2*6,1.2*4))
ut.centerAxes(ax)
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off') # labels along the bottom edge are off
plt.tick_params(
axis='y', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
left='off', # ticks along the bottom edge are off
right='off', # ticks along the top edge are off
labelleft='off') # labels along the bottom edge are off
pt = [4., 3.]
plt.plot([0,pt[0]],[0,pt[1]],'b-',lw=2)
plt.plot([pt[0],pt[0]],[0,pt[1]],'b--',lw=2)
plt.plot([0,pt[0]],[0,0],'r-',lw=3)
plt.plot([0,0],[0,pt[1]],'r-',lw=3)
ut.plotVec(ax,pt)
u = np.array([pt[0],0])
v = [0,pt[1]]
ut.plotVec(ax,u)
ut.plotVec(ax,2*u)
ut.plotVec(ax,v)
ax.text(pt[0],-0.75,r'${\bf \hat{y}}=\alpha{\bf u}$',size=20)
ax.text(2*pt[0],-0.75,r'$\mathbf{u}$',size=20)
ax.text(pt[0]+0.1,pt[1]+0.2,r'$\mathbf{y}$',size=20)
ax.text(0+0.1,pt[1]+0.2,r'$\mathbf{z = y -\hat{y}}$',size=20)
ax.text(0+0.1,pt[1]+0.8,r'Component of $\mathbf{y}$ orthogonal to $\mathbf{u}$',size=16)
ax.text(pt[0],-1.25,r'Orthogonal projection of $\mathbf{y}$ onto to $\mathbf{u}$',size=16)
print ''
Now, note that if we had scaled $\vu$ by any amount (ie, moved it to the right or the left), we would not have changed the location of $\hat{\vy}.$
This can be seen as well by replacing $\vu$ with $c\vu$ and recomputing $\hat{\vy}$:
$$\hat{\vy} = \frac{\vy^Tc\vu}{c\vu^Tc\vu}c\vu = \frac{\vy^T\vu}{\vu^T\vu}\vu.$$Thus, the projection of $\vy$ is determined by the subspace $L$ that is spanned by $\vu$ -- in other words, the line through $\vu$ and the origin.
Hence sometimes $\hat{\vy}$ is denoted by $\mbox{proj}_L \vy$ and is called the orthogonal projection of $\vy$ onto $L$.
Specifically:
$$\vy = \mbox{proj}_L \vy = \frac{\vy^T\vu}{\vu^T\vu}\vu$$Example. Let $\vy = \mat{{c}7\\6}$ and $\vu = \mat{{c}4\\2}.$
Find the orthogonal projection of $\vy$ onto $\vu.$ Then write $\vy$ as the sum of two orthogonal vectors, one in Span$\{\vu\}$, and one orthogonal to $\vu.$
Solution. Compute
$$\vy^T\vu = \mat{{cc}7&6}\mat{{c}4\\2} = 40$$$$\vu^T\vu = \mat{{cc}4&2}\mat{{c}4\\2} = 20$$The orthogonal projection of $\vy$ onto $\vu$ is
$$\hat{\vy} = \frac{\vy^T\vu}{\vu^T\vu} \vu$$$$=\frac{40}{20}\vu = 2\mat{{c}4\\2} = \mat{{c}8\\4}$$And the component of $\vy$ orthogonal to $\vu$ is
$$\vy-\hat{\vy} = \mat{{c}7\\6} - \mat{{c}8\\4} = \mat{{c}-1\\2}.$$So
$$\vy = \hat{\vy} + \vz$$$$\mat{{c}7\\6} = \mat{{c}8\\4} + \mat{{c}-1\\2}.$$sl.hide_code_in_slideshow()
ax = ut.plotSetup(-3,11,-1,7,(1.2*6,1.2*4))
ut.centerAxes(ax)
plt.axis('equal')
u = np.array([4.,2])
y = np.array([7.,6])
yhat = (y.T.dot(u)/u.T.dot(u))*u
z = y-yhat
ut.plotLinEqn(1.,-2.,0.)
ut.plotVec(ax,u)
ut.plotVec(ax,z)
ut.plotVec(ax,y)
ut.plotVec(ax,yhat)
ax.text(u[0]+0.3,u[1]-0.5,r'$\mathbf{u}$',size=20)
ax.text(yhat[0]+0.3,yhat[1]-0.5,r'$\mathbf{\hat{y}}$',size=20)
ax.text(y[0],y[1]+0.8,r'$\mathbf{y}$',size=20)
ax.text(z[0]-1.6,z[1],r'$\mathbf{y - \hat{y}}$',size=20)
ax.text(11,4.5,r'$L = $Span$\{\mathbf{u}\}$',size=20)
ax.plot([y[0],yhat[0]],[y[1],yhat[1]],'b--')
ax.plot([0,y[0]],[0,y[1]],'b-')
ax.plot([0,z[0]],[0,z[1]],'b-')
print ''
sl.hide_code_in_slideshow()
ax = ut.plotSetup(-3,11,-1,7,(1.2*6,1.2*4))
ut.centerAxes(ax)
plt.axis('equal')
u = np.array([4.,2])
y = np.array([7.,6])
yhat = (y.T.dot(u)/u.T.dot(u))*u
z = y-yhat
ut.plotLinEqn(1.,-2.,0.)
ut.plotVec(ax,u)
ut.plotVec(ax,z)
ut.plotVec(ax,y)
ut.plotVec(ax,yhat)
ax.text(u[0]+0.3,u[1]-0.5,r'$\mathbf{u}$',size=20)
ax.text(yhat[0]+0.3,yhat[1]-0.5,r'$\mathbf{\hat{y}}$',size=20)
ax.text(y[0],y[1]+0.8,r'$\mathbf{y}$',size=20)
ax.text(z[0]-1.6,z[1],r'$\mathbf{y - \hat{y}}$',size=20)
ax.text(11,4.5,r'$L = $Span$\{\mathbf{u}\}$',size=20)
ax.plot([y[0],yhat[0]],[y[1],yhat[1]],'b--')
ax.plot([0,y[0]],[0,y[1]],'b-')
ax.plot([0,z[0]],[0,z[1]],'b-')
print ''
The closest point.
Recall from geometry that given a line and a point $P$, the closest point on the line to $P$ is given by the perpendicular from $P$ to the line.
So this gives an important interpretation of $\hat{\vy}$: it is the closest point to $\vy$ in the subspace $L$.
The distance from $\vy$ to $L$
The distance from $\vy$ to $L$ is the length of the perpendicular from $\vy$ to its orthogonal projection on $L$, namely $\hat{\vy}$.
This distance equals the length of $\vy - \hat{\vy}$.
In this example, the distance is
$$\Vert\vy-\hat{\vy}\Vert = \sqrt{(-1)^2 + 2^2} = \sqrt{5}.$$Earlier today, we saw that when we decompose a vector $\vy$ into a linear combination of vectors $\{\vu_1,\dots,\vu_p\}$ in a orthogonal set, we have
$$\vy = c_1\vu_1 + \dots + c_p\vu_p$$where
$$c_j = \frac{\vy^T\vu_j}{\vu_j^T\vu_j}$$And just now we have seen that the projection of $\vy$ onto the subspace spanned by $\vu$ is
$$\mbox{proj}_L \vy = \frac{\vy^T\vu}{\vu^T\vu}\vu.$$So a decomposition like $\vy = c_1\vu_1 + \dots + c_p\vu_p$ is really decomposing $\vy$ into a sum of orthogonal projections onto one-dimensional subspaces.
For example, let's take the case where $\vy \in \R^2.$ Let's say we are given $\vu_1, \vu_2$ such that $\vu_1$ is orthogonal to $\vu_2$, and so together they span $\R^2.$
Then $\vy$ can be written in the form
$$\vy = \frac{\vy^T\vu_1}{\vu_1^T\vu_1}\vu_1 + \frac{\vy^T\vu_2}{\vu_2^T\vu_2}\vu_2.$$The first term is the projection of $\vy$ onto the subspace spanned by $\vu_1$ and the second term is the projection of $\vy$ onto the subspace spanned by $\vu_2.$
So this equation expresses $\vy$ as the sum of its projections onto the (orthogonal) axes determined by $\vu_1$ and $\vu_2$.
# image credit: Lay 4th edition figure 4 in Ch 6.2
sl.hide_code_in_slideshow()
display(Image("images/Lay-fig-6-2-4.jpg", width=600))
A set $\{\vu_1,\dots,\vu_p\}$ is an orthonormal set if it is an orthogonal set of unit vectors.
If $W$ is the subspace spanned by such as a set, then $\{\vu_1,\dots,\vu_p\}$ is an orthonormal basis for $W$ since the set is automatically linearly independent.
The simplest example of an orthonormal set is the standard basis $\{\ve_1, \dots,\ve_n\}$ for $\R^n$. Any nonempty subset of $\{\ve_1,\dots,\ve_n\}$ is orthonormal as well.
Pro tip: keep the terms clear in your head:
(You can see the word "normalized" inside "orthonormal").
Matrices with orthonormal columns are particularly important.
Theorem. A $m\times n$ matrix $U$ has orthonormal columns if and only if $U^TU = I$.
Proof. Let us suppose that $U$ has only three columns which are each vectors in $\R^m$ (but the proof will generalize to $n$ columns).
Let $U = [\vu_1\;\vu_2\;\vu_3].$ Then:
$$U^TU = \mat{{c}\vu_1^T\\\vu_2^T\\\vu_3^T}\mat{{ccc}\vu_1&\vu_2&\vu_3}$$The columns of $U$ are orthogonal if and only if
$$\vu_1^T\vu_2 = \vu_2^T\vu_1 = 0,\;\; \vu_1^T\vu_3 = \vu_3^T\vu_1 = 0,\;\; \vu_2^T\vu_3 = \vu_3^T\vu_2 = 0$$The columns of $U$ all have unit length if and only if
$$\vu_1^T\vu_1 = 1,\;\;\vu_2^T\vu_2 = 1,\;\;\vu_3^T\vu_3 = 1.$$So $U^TU = I.$
Theorem. Let $U$ by an $m\times n$ matrix with orthonormal columns, and let $\vx$ and $\vy$ be in $\R^n.$ Then:
Properties 1. and 3. say that the linear mapping $\vx\mapsto U\vx$ preserves lengths and orthogonality.
So, viewed as a linear operator, an orthonormal matrix is very special: the lengths of vectors, and therefore the distances between points is not changed by the action of $U$.
Example. Let $U = \mat{{cc}1/\sqrt{2}&2/3\\1/\sqrt{2}&-2/3\\0&1/3}$ and $\vx = \mat{{c}\sqrt{2}\\3}.$ Notice that $U$ has orthonormal columns, and
$$U^TU = \mat{{ccc}1/\sqrt{2}&1/\sqrt{2}&0\\2/3&-2/3&1/3}\mat{{cc}1/\sqrt{2}&2/3\\1/\sqrt{2}&-2/3\\0&1/3} = \mat{{cc}1&0\\0&1}.$$Let's verify that $\Vert Ux\Vert = \Vert x\Vert.$
$$U\vx = \mat{{cc}1/\sqrt{2}&2/3\\1/\sqrt{2}&-2/3\\0&1/3}\mat{{c}\sqrt{2}\\3} = \mat{{c}3\\-1\\1}$$Orthonormal Square Matrices. Consider the case when $U$ is square, and has orthonormal columns.
Then the fact that $U^TU = I$ implies that $U^{-1} = U^T.$
Then $U$ is called an orthogonal matrix.
(Note that this terminology could be confusing; the columns of $U$ are not just orthogonal but actually orthonormal.)