#!/usr/bin/env python
# coding: utf-8
# In[2]:
from IPython.display import Image
Image('../../../python_for_probability_statistics_and_machine_learning.jpg')
# [Python for Probability, Statistics, and Machine Learning](https://www.springer.com/fr/book/9783319307152)
# # Projection Methods
#
# The concept of projection is key to developing an intuition about conditional
# probability. We already have a natural intuition of projection from looking at
# the shadows of objects on a sunny day. As we will see, this simple idea
# consolidates many abstract ideas in optimization and mathematics. Consider
# [Figure](#fig:probability_001) where we want to find a point along the blue
# line (namely, $\mathbf{x}$) that is closest to the black square (namely,
# $\mathbf{y}$). In other words, we want to inflate the gray circle until it just
# touches the black line. Recall that the circle boundary is the set of points for
# which
# $$
# \sqrt{(\mathbf{y}-\mathbf{x})^T(\mathbf{y}-\mathbf{x})} =\|\mathbf{y}-\mathbf{x} \| = \epsilon
# $$
# for some value of $\epsilon$. So we want a point $\mathbf{x}$ along
# the line that satisfies this for the smallest $\epsilon$. Then, that point
# will be the closest point on the black line to the black square.
# It may be obvious from the diagram, but the closest point on the line
# occurs where the line segment from the black square to the black line is
# perpedicular to the line. At this point, the gray circle just touches the black
# line. This is illustrated below in [Figure](#fig:probability_002).
#
#
#
#
#
# Given the point $\mathbf{y}$ (black square) we want to find the $\mathbf{x}$ along the line that is closest to it. The gray circle is the locus of points within a fixed distance from $\mathbf{y}$.
#
#
#
#
#
# **Programming Tip.**
#
# [Figure](#fig:probability_001) uses the `matplotlib.patches` module. This
# module contains primitive shapes like circles, ellipses, and rectangles that
# can be assembled into complex graphics. As shown in the code in the IPython
# Notebook corresponding to this chapter, after importing a particular shape, you
# can apply that shape to an existing axis using the `add_patch` method. The
# patches themselves can by styled using the usual formatting keywords like
# `color` and `alpha`.
#
#
#
#
#
#
#
# The closest point on the line occurs when the line is tangent to the circle. When this happens, the black line and the line (minimum distance) are perpedicular.
#
#
#
#
#
# Now that we can see what's going on, we can construct the the solution
# analytically. We can represent an arbitrary point along the black line as:
# $$
# \mathbf{x}=\alpha\mathbf{v}
# $$
# where $\alpha\in\mathbb{R}$ slides the point up and down the line with
# $$
# \mathbf{v} = \left[ 1,1 \right]^T
# $$
# Formally, $\mathbf{v}$ is the *subspace* onto which we want to
# *project* $\mathbf{y}$. At the closest point, the vector between
# $\mathbf{y}$ and $\mathbf{x}$ (the *error* vector above) is
# perpedicular to the line. This means that
# $$
# (\mathbf{y}-\mathbf{x} )^T \mathbf{v} = 0
# $$
# and by substituting and working out the terms, we obtain
# $$
# \alpha = \frac{\mathbf{y}^T\mathbf{v}}{ \|\mathbf{v} \|^2}
# $$
# The *error* is the distance between $\alpha\mathbf{v}$ and $
# \mathbf{y}$. This is a right triangle, and we can use the Pythagorean
# theorem to compute the squared length of this error as
# $$
# \epsilon^2 = \|( \mathbf{y}-\mathbf{x} )\|^2 = \|\mathbf{y}\|^2 - \alpha^2 \|\mathbf{v}\|^2 = \|\mathbf{y}\|^2 - \frac{\|\mathbf{y}^T\mathbf{v}\|^2}{\|\mathbf{v}\|^2}
# $$
# where $ \|\mathbf{v}\|^2 = \mathbf{v}^T \mathbf{v} $. Note that since $\epsilon^2 \ge 0 $, this also shows that
# $$
# \| \mathbf{y}^T\mathbf{v}\| \le \|\mathbf{y}\| \|\mathbf{v}\|
# $$
# which is the famous and useful Cauchy-Schwarz inequality which we
# will exploit later. Finally, we can assemble all of this into the *projection*
# operator
# $$
# \mathbf{P}_v = \frac{1}{\|\mathbf{v}\|^2 } \mathbf{v v}^T
# $$
# With this operator, we can take any $\mathbf{y}$ and find the closest
# point on $\mathbf{v}$ by doing
# $$
# \mathbf{P}_v \mathbf{y} = \mathbf{v} \left( \frac{ \mathbf{v}^T \mathbf{y} }{\|\mathbf{v}\|^2} \right)
# $$
# where we recognize the term in parenthesis as the $\alpha$ we
# computed earlier. It's called an *operator* because it takes a vector
# ($\mathbf{y}$) and produces another vector ($\alpha\mathbf{v}$). Thus,
# projection unifies geometry and optimization.
#
# ## Weighted distance
#
# We can easily extend this projection operator to cases where the measure of
# distance between $\mathbf{y}$ and the subspace $\mathbf{v}$ is weighted. We can
# accommodate these weighted distances by re-writing the projection operator as
#
#
#
# $$
# \begin{equation}
# \mathbf{P}_v=\mathbf{v}\frac{\mathbf{v}^T\mathbf{Q}^T}{\mathbf{v}^T\mathbf{Q v}}
# \end{equation}
# \label{eq:weightedProj} \tag{1}
# $$
# where $\mathbf{Q}$ is positive definite matrix. In the previous
# case, we started with a point $\mathbf{y}$ and inflated a circle centered at
# $\mathbf{y}$ until it just touched the line defined by $\mathbf{v}$ and this
# point was closest point on the line to $\mathbf{y}$. The same thing happens
# in the general case with a weighted distance except now we inflate an
# ellipse, not a circle, until the ellipse touches the line.
#
#
#
#
#
#
#
# In the weighted case, the closest point on the line is tangent to the ellipse and is still perpedicular in the sense of the weighted distance.
#
#
#
#
#
# Note that the error vector ($\mathbf{y}-\alpha\mathbf{v}$) in [Figure](#fig:probability_003) is still perpendicular to the line (subspace
# $\mathbf{v}$), but in the space of the weighted distance. The difference
# between the first projection (with the uniform circular distance) and the
# general case (with the elliptical weighted distance) is the inner product
# between the two cases. For example, in the first case we have $\mathbf{y}^T
# \mathbf{v}$ and in the weighted case we have $\mathbf{y}^T \mathbf{Q}^T
# \mathbf{v}$. To move from the uniform circular case to the weighted ellipsoidal
# case, all we had to do was change all of the vector inner products. Before we
# finish, we need a formal property of projections:
# $$
# \mathbf{P}_v \mathbf{P}_v = \mathbf{P}_v
# $$
# known as the *idempotent* property which basically says that once we
# have projected onto a subspace, subsequent projections leave us in the
# same subspace. You can verify this by computing Equation ref{eq:weightedProj}.
#
# Thus, projection ties a minimization problem (closest point to a line) to an
# algebraic concept (inner product). It turns out that these same geometric ideas
# from linear algebra [[strang2006linear]](#strang2006linear) can be translated to the conditional
# expectation. How this works is the subject of our next section.