PAC stands for "probably approximately correct", which is a framework and set of assumptions under which numerous results on learning theory were proven.
Softness | Color | $Pr(x) $ (Probability) | $h(x)$ | $f(x)$ |
---|---|---|---|---|
Soft | Green | 0.1 | Tasty | Not-Tasty |
Hard | Green | 0.1 | Not-Tasty | Not-Tasty |
Soft | Orange | 0.7 | Tasty | Tasty |
Hard | Orange | 0.1 | Tasty | Not-Tasty |
Thus, with probability at least $1-\delta$ we also have that: $$ \epsilon(\hat{h}) \leq \epsilon(h^{*}) + O(\sqrt{\frac{d}{m}\log\frac{m}{d} + \frac{1}{m}\log\frac{1}{\delta}}) $$ * $\epsilon(h)$ is the real (test) error and $\hat{\epsilon}(h)$ is the training error (empirical risk). * In other words, if a hypothesis class has finite VC dimension, then uniform convergence occurs as $m$ becomes large. * This is a very strong result because we can make a statement on data we have not seen!
Consider 9 samples, and 8 hypotheses as follows:
$x_1$ | $x_2$ | $x_3$ | $x_4$ | $x_5$ | $x_6$ | $x_7$ | $x_8$ | $x_9$ | |
---|---|---|---|---|---|---|---|---|---|
$h_1$ | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
$h_2$ | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
$h_3$ | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
$h_4$ | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
$h_5$ | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
$h_6$ | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
$h_7$ | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
$h_8$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Image Source (CalTech's free machine Learning online course by Yaser Abu-Mostafa)
Image Source (CalTech's free machine Learning online course by Yaser Abu-Mostafa)
1.Four ($n=4$) points can be shattered as seen in the following arrangement:
Image from Princeton's COS 511: Theoretical Machine Learning, Lecture on VC-Dimension
2.No five ($n+1=5$) can be shattered - for any 5-point set, we can construct a data assignment in this way: pick the topmost, bottommost, leftmost and rightmost points and give them the label “+”. Because there are 5 points, there must be at least one point left to which we assign “−”. Any rectangle that contains all the “+” points must contain the “−” point, which is a case where shattering is not possible.
1.Three ($n=3$) points can be shattered as seen in the following arrangement:
2.No four ($n+1=4$) can be shattered - We consider two cases:
Let's see why $VCdim(\mathcal{H}) = \infty$: for any positive integer $n$, take $n$ points from $\mathcal{X}$. Place the $n$ points uniformly spaced on the unit circle. For each $2^n$ subset of this data, there is a convex polygon with vertices at these $n$ points. For each subset, the convex polygon contains the set and excludes its complement.