import sympy
from sympy import symbols
sympy.init_printing(use_unicode=True)
Let $\{ a_j \}_{j=1}^n$ and $\{q_j\}_{j=1}^n$ be sets of nonnegative numbers with $\sum_j q_j = 1$. Then \begin{equation*} \prod_{j=1}^n a_j^{q_j} < \sum_{j=1}^n q_j a_j. \end{equation*}
Let $\mathcal{X}$ be a real linear vector space, and let $x, y \in \mathcal{X}$. Then \begin{equation*} | \langle x, y \rangle|^2 \le \langle x, x \rangle \langle y, y \rangle. \end{equation*}
Let $x \equiv (x_j)_{j=1}^n \in {\mathbb R}^n$. The decreasing rearrangement of $x$, denoted $x^*$, has the same multiset of component values as $x$, but ordered so that \begin{equation*} x_1^* \ge x_2^* \ge \cdots \ge x_n^*. \end{equation*}
The increasing rearrangmement of $x$, denoted $x_*$, has the same multiset of component values as $x$, but ordered so that \begin{equation*} x_{*1} \le x_{*2} \le \cdots \le x_{*n}. \end{equation*}
Suppose $x \in \mathbb{R}^n$ and $y \in \mathbb{R}^n$. Let $x \cdot y \equiv \sum_{j=1}^n x_j y_j$. Then \begin{equation*} x^* \cdot y_* \le x \cdot y \le x^* \cdot y^*. \end{equation*}
Suppose $f: \mathbb{R}^n \rightarrow \mathbb{R}^+$ has a finite integral. The symmetric decreasing rearrangement $f^*$ of $f$ is a function such that:
Suppose $f: \mathbb{R}^{n} \to \mathbb{R}^{+}$ and $g: \mathbb{R}^{n} \to \mathbb{R}^{+}$. Then \begin{equation*} \int_{\mathbb{R}^n} f(x) g(x) dx \le \int_{\mathbb{R}^n} f^*(x) g^*(x) dx. \end{equation*}
Suppose $f: \mathbb{R}^{n} \to \mathbb{R}^{+}$, $g: \mathbb{R}^{n} \to \mathbb{R}^{+}$, and $g: \mathbb{R}^{n} \to \mathbb{R}^{+}$.
Then \begin{equation*} \int f(x)g(x-y)h(y)\,dx\,dy \le \int f^{*}(x)g^{*}(x-y)h^{*}(y)\,dx\,dy. \end{equation*}
For $\ell \ge 1$ and $m \ge 2$,
\begin{equation} { {\ell m } \choose { \ell }} \ge \frac{m^{m(\ell-1)+1}}{\sqrt{\ell} (m-1)^{(m-1)(\ell-1)}}. \end{equation}where $H(q) \equiv -q \log_2(q) - (1-q) \log_2 (1-q)$.
If $X$ and $Y$ are random variables with a joint distribution,
\begin{equation*} \mathbb{E} X = \mathbb{E}_Y \mathbb{E}(X|Y). \end{equation*}If $X$ and $Y$ are random variables with a joint distribution,
\begin{equation*} Var(X) = \mathbb{E}_Y Var(X|Y) + Var \mathbb{E} (X|Y). \end{equation*}This section follows Lugosi (2006) closely.
If $X$ is a nonnegative real-valued random variable, \begin{equation*} {\mathbb E} X = \int_0^\infty {\mathbb{P}} \{X \ge x\} dx \end{equation*}
If $\phi$ is a convex function from ${\mathcal X}$ to $\mathbb{R}$, then $\phi({\mathbb E} X) \le {\mathbb E} \phi(X)$.
From the tail-sum formula, \begin{equation*} {\mathbb E} X \ge \int_0^t {\mathbb{P}} \{X \ge x\} dx \ge t {\mathbb{P}} \{X \ge t \} \end{equation*} so \begin{equation*} {\mathbb{P}} \{ X \ge t \} \le \frac{{\mathbb E} X}{t}. \end{equation*} This is Markov's Inequality.
Moreover, for any strictly monotonic function $f$ and nonnegative $X$, \begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ f(X) \ge f(t) \} \le \frac{{\mathbb E} f(X)}{f(t)}. \end{equation*} In particular, for any real-valued $X$ and real $q > 0$, $| X - {\mathbb E} X |$ is a nonnegative random variable and $f(x) = x^q$ is strictly monotonic, so \begin{equation*} {\mathbb{P}} \{| X - {\mathbb E} X | \ge t \} = {\mathbb{P}} \{ | X - {\mathbb E} X |^q \ge t^q \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^q}{t^q}. \end{equation*} This is the Generalized Markov Inequality.
Chebychev's Inequality is a special case: \begin{equation*} {\mathbb{P}} \{ | X - {\mathbb E} X |^2 \ge t^2 \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^2}{t^2} = \frac{{\mbox{Var}} X}{t^2}. \end{equation*}
If $X$ has a moment generating function (if ${\mathbb{E}} e^{sX}$ exists for $s \in [-a, a]$ for some $a > 0$), we can apply the generalized Markov Inequality with $f(x) = e^{sx}$ for positive $s$: \begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \frac{{\mathbb E} e^{sX}}{e^{st}} \end{equation*} for all $s \in [-a, a]$. For any particular $X$, one can optimize this over $s$: \begin{equation*} {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \inf_{s \in [-a, a]} \frac{{\mathbb E} e^{sX}}{e^{st}}. \end{equation*}
Let $\{ X_j \}_{j=1}^n$ be independent, and define $S_n \equiv \sum_{j=1}^n X_j$. Then, applying the Chernoff bound gives \begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-st} {\mathbb E} e^{sS_n} = e^{-st} \prod_{j=1}^n e^{s(X_j - E X_j)}. \end{equation*} Hoeffding bounds the moment generating function for a bounded random variable $X$: If $\mathbb{P} (X \in [a, b]) = 1$, then \begin{equation*} {\mathbb E} e^{sX} \le e^{s^2 (b-a)^2/8}, \end{equation*} from which follows Hoeffding's tail bound:
If $\{X_j\}_{j=1}^n$ are independent and ${\mathbb{P}} \{a_j \le X_j \le b_j\} = 1$, then \begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*} and \begin{equation*} {\mathbb{P}} \{ S_n - {\mathbb E} S_n \le -t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} \end{equation*}
Suppose $f$ is a convex, continuous, real function and ${\mathcal X}$ is a finite set. Let $\{X_j \}_{j=1}^n$ be a simple random sample from ${\mathcal X}$ and let $\{Y_j \}_{j=1}^n$ be an iid uniform random sample (with replacement) from ${\mathcal X}$. Then \begin{equation*} {\mathbb E} f \left ( \sum_{j=1}^n X_j \right ) \le {\mathbb E} f \left ( \sum_{j=1}^n Y_j \right ). \end{equation*}
There are many Bernstein Inequalities, which are derived by applying Markov's inequality to random variables of the form $\exp (\lambda \sum _{j=1}^n X_j )$ for suitable $\lambda$. Chernoff bounds, Hoeffding's inequality, and Azuma's inequality are special cases.
Here are two of Bernstein's inequalities.
${\mathbb{P}} \{ | X_j | \le c\} = 1$, $\sigma_j^2 = {\mathbb E} X_j^2$ and $\sigma = \frac{1}{n} \sum_{j=1}^n \sigma_j^2$. Then for any $\epsilon > 0$, \begin{equation*} {\mathbb{P}} \{ S_n/n > \epsilon \} \le e^{-n \epsilon^2 / 2(\sigma^2 + c\epsilon/3)}. \end{equation*}
Suppose that for some $C>0$ and every integer $k \ge 2$, \begin{equation} \mathbb{E} \left| X_j^k \right| \le \frac{C^{k-2} k!}{2} \mathbb{E} X_i^{2}. \end{equation} Then for all $t \in \left [ 0, (2C)^{-1} \sqrt{\sum \mathbb{E} X_j^2} \right ]$, \begin{equation} \mathbb{P} \left( \sum_{j=1}^n X_j \ge 2t \sqrt{\sum \mathbb {E} X_j^2} \right) < e^{-t^2}. \end{equation}
There are also empirical Bernstein bounds, which bound probabilities in terms of observed (sample) moments instead of the theoretical moments. The following example is from Maurer and Pontil, 1987 (Theorem 11):
Let $X = (X_j)_{j=1}^n$ be a vector of independent random variables that take values in $[0, 1]$. Let $\bar{X} := \frac{1}{n} \sum_j X_j$ and $V_n(X) := \frac{1}{n(n-1)} \sum_{i, j = 1}^n (X_i - X_j)^2/2$. Let $\delta > 0$. Then with probability at least $1 − \delta$ in $X$, \begin{equation} \mathbb{E} \bar{X} \le \bar{X} + \sqrt{ \frac{2V_n (X) \ln 2/\delta}{n}} + \frac{7 \ln 2/\delta}{3(n − 1)}. \end{equation}
Suppose the distributions $\mathbb{P}$, $\mathbb{Q}$, and $\mathbb{R}$ on a common outcome space have densities $f_{\mathbb{P}}$, $f_{\mathbb{Q}}$, and $f_{\mathbb{R}}$ with respect to the same dominating measure $\mu$. Then \begin{equation} \mathbb{E}_\mathbb{Q} \log(f_{\mathbb{Q}}/f_{\mathbb{P}}) \ge \mathbb{E}_\mathbb{Q} \log(f_{\mathbb{R}}/f_{\mathbb{P}}). \end{equation}
See martingales for more.
A sequence of random variables $\{ X_j \}$ is a martingale if
$\mathbb{E} |X_j| < \infty$ $\forall j$, and
$\mathbb{E} (|X_{j+1}| \mid X_j) = X_j$.
A sequence of random variables $\{ X_j \}$ is a martingale with respect to the sequence $\{Y_j \}$ if
$\mathbb{E} |X_j| < \infty$ $\forall j$, and
$\mathbb{E} (|X_{j+1}| \mid Y_j) = X_j$.
A nonnegative martingale is a martingale for which $\Pr \{X_j \ge 0 \} = 1$ $\forall j$.
A gambler's fortune in a series of fair bets with the same stakes is a martingale: the expected fortune after the $j+1$st bet is the fortune after the $j$th bet.
Likelihood ratios: Suppose $\{Y_j \}$ are IID with density $f$, and let $g$ be some other density.
Define $X_j \equiv \prod_{i=1}^j g(X_i)/f(X_i)$. Then $(X_j)$ is a martingale with respect to $\{Y_j\}$. (More generaly, likelihood ratios are nonnegative martingales with respect to the distribution in the denominator.)
If $(X_j)$ is a nonnegative supermartingale, then \begin{equation*} \Pr \{ \sup_j X_j \ge a \} \le \mathbb{E} X_1/a. \end{equation*}
Ville's inequality is foundational for sequential testing. See the SPRT, the SPRT without replacement, and nonparametric confidence sets.