#!/usr/bin/env python # coding: utf-8 # In[2]: from IPython.display import Image Image('../../../python_for_probability_statistics_and_machine_learning.jpg') # [Python for Probability, Statistics, and Machine Learning](https://www.springer.com/fr/book/9783319307152) # In[3]: from pprint import pprint import textwrap import sys, re # # Useful Inequalities # # In practice, few quantities can be analytically calculated. Some knowledge # of bounding inequalities helps find the ballpark for potential solutions. This # sections discusses three key inequalities that are important for # probability, statistics, and machine learning. # # ## Markov's Inequality # # Let $X$ be a non-negative random variable # and suppose that $\mathbb{E}(X) < \infty$. Then, # for any $t>0$, # $$ # \mathbb{P}(X>t)\leq \frac{\mathbb{E}(X)}{t} # $$ # This is a foundational inequality that is # used as a stepping stone to other inequalities. It is easy # to prove. Because $X>0$, we have the following, # $$ # \begin{align*} # \mathbb{E}(X)&=\int_0^\infty x f_x(x)dx =\underbrace{\int_0^t x f_x(x)dx}_{\text{omit this}}+\int_t^\infty x f_x(x)dx \\\ # &\ge\int_t^\infty x f_x(x)dx \ge t\int_t^\infty x f_x(x)dx = t \mathbb{P}(X>t) # \end{align*} # $$ # The step that establishes the inequality is the part where the # $\int_0^t x f_x(x)dx$ is omitted. For a particular $f_x(x)$ that my be # concentrated around the $[0,t]$ interval, this could be a lot to throw out. # For that reason, the Markov Inequality is considered a *loose* inequality, # meaning that there is a substantial gap between both sides of the inequality. # For example, as shown in [Figure](#fig:ProbabilityInequalities_001), the # $\chi^2$ distribution has a lot of its mass on the left, which would be omitted # in the Markov Inequality. [Figure](#fig:ProbabilityInequalities_002) shows # the two curves established by the Markov Inequality. The gray shaded region is # the gap between the two terms and indicates that looseness of the bound # (fatter shaded region) for this case. # # # #

# #

The $\chi_1^2$ density has much of its weight on the left, which is excluded in the establishment of the Markov Inequality.

# # # # # # #

# #

The shaded area shows the region between the curves on either side of the Markov Inequality.

# # # # # ## Chebyshev's Inequality # # Chebyshev's Inequality drops out directly from the Markov Inequality. Let # $\mu=\mathbb{E}(X)$ and $\sigma^2=\mathbb{V}(X)$. Then, we have # $$ # \mathbb{P}(\vert X-\mu\vert \ge t) \le \frac{\sigma^2}{t^2} # $$ # Note that if we normalize so that $Z=(X-\mu)/\sigma$, we # have $\mathbb{P}(\vert Z\vert \ge k) \le 1/k^2$. In particular, # $\mathbb{P}(\vert Z\vert \ge 2) \le 1/4$. We can illustrate this # inequality using Sympy statistics module, # In[4]: import sympy import sympy.stats as ss t=sympy.symbols('t',real=True) x=ss.ChiSquared('x',1) # To get the left side of the Chebyshev inequality, we # have to write this out as the following conditional probability, # In[5]: r = ss.P((x-1) > t,x>1)+ss.P(-(x-1) > t,x<1) # This is because of certain limitations in the statistics module at # this point in its development regarding the absolute value function. We could # take the above expression, which is a function of $t$ and attempt to compute # the integral, but that would take a very long time (the expression is very long # and complicated, which is why we did not print it out above). This is because # Sympy is a pure-python module that does not utilize any C-level optimizations # under the hood. In this situation, it's better to use the built-in cumulative # density function as in the following (after some rearrangement of the terms), # In[6]: w=(1-ss.cdf(x)(t+1))+ss.cdf(x)(1-t) # To plot this, we can evaluated at a variety of `t` values by using # the `.subs` substitution method, but it is more convenient to use the # `lambdify` method to convert the expression to a function. # In[7]: fw=sympy.lambdify(t,w) # Then, we can evaluate this function using something like # In[8]: map(fw,[0,1,2,3,4]) # to produce the following [Figure](#fig:ProbabilityInequalities_003). # # # #

# #

The shaded area shows the region between the curves on either side of the Chebyshev Inequality.

# # # # # **Programming Tip.** # # Note that we cannot use vectorized inputs for the `lambdify` function because # it contains embedded functions that are only available in Sympy. Otherwise, we # could have used `lambdify(t,fw,numpy)` to specify the corresponding functions # in Numpy to use for the expression. # # # # ## Hoeffding's Inequality #

# # Hoeffding's Inequality is similar, but less loose, than Markov's Inequality. # Let $X_1,\ldots,X_n$ be iid observations such that $\mathbb{E}(X_i)=\mu$ and # $a\le X_i \le b$. Then, for any $\epsilon>0$, we have # $$ # \mathbb{P}(\vert \overline{X}_n -\mu\vert \ge \epsilon) \le 2 \exp(-2 n\epsilon^2/(b-a)^2) # $$ # where $\overline{X}_n = \tfrac{1}{n}\sum_i^n X_i$. Note that we # further assume that the individual random variables are bounded. # # **Corollary.** If $X_1,\ldots,X_n$ are independent with $\mathbb{P}(a\le X_i\le b)=1$ # and all with $\mathbb{E}(X_i)=\mu$. Then, we have # $$ # \vert\overline{X}_n-\mu\vert \le \sqrt{\frac{c}{2 n}\log \frac{2}{\delta}} # $$ # where $c=(b-a)^2$. We will see this inequality again in the machine # learning chapter. [Figure](#fig:ProbabilityInequalities_004) shows the Markov # and Hoeffding bounds for the case of ten identically and uniformly distributed # random variables, $X_i \sim \mathcal{U}[0,1]$. The solid line shows # $\mathbb{P}(\vert \overline{X}_n - 1/2 \vert > \epsilon)$. Note that the # Hoeffding Inequality is tighter than the Markov Inequality and that both of # them merge when $\epsilon$ gets big enough. # # # #

# #

This shows the Markov and Hoeffding bounds for the case of ten identically and uniformly distributed random variables.

# #