A set is a collection of objects, called members or elements of the set, without regard for their order. $a \in A$, pronounced "$a$ is an element of $A$," "$a$ is in $A$," or "$a$ is a member of $A$" means that $a$ is an element of the set $A$. This is the same as writing $A \ni a$, which is pronounced "$A$ contains $a$." If $a$ is not an element of $A$, we write $a \not \in A$. Sets may be described explicitly by listing their contents, or implicitly by specifying a property that all elements of the set share, or a condition that they satisfy. The contents of sets are enclosed in curly braces: $\{ \}$. Examples:
$B$ is a subset of $A$, written $B \subset A$ or $A \supset B$, if every element of the set $B$ is also an element of the set $A$. Thus ${\mathbb N} \subset {\mathbb Z} \subset {\mathbb Q} \subset {\mathbb R} \subset {\mathbb C}$. The empty set $\emptyset$ is a subset of every set. If $A \subset B$ and $B \subset A$, $A$ and $B$ are the same set, and we write $A = B$. If $B$ is not a subset of $A$, we write $B \not \subset A$ or $A \not \supset B$. $B$ is a proper subset of $A$ if $B \subset A$ but $A \not \subset B$.
The complement of $A$ (with respect to the universe $\mathcal{X}$), written $A^c$ or $A'$, is the set of all objects under consideration ($\mathcal{X}$) that are not elements of $A$. That is, $A^c \equiv \{ a \in \mathcal{X} : a \not \in A\}$.
The intersection of $A$ and $B$, written $A \cap B$ or $AB$, is the set of all objects that are elements of both $A$ and $B$: \begin{equation*} A \cap B \equiv \{a: a \in A \mbox{ and } a \in B \}. \end{equation*} If $A \cap B = \emptyset$, we say $A$ and $B$ are disjoint or mutually exclusive.
The union of $A$ and $B$, written $A \cup B$, is the set of all objects that are elements of $A$ or of $B$ (or both): \begin{equation*} A \cup B \equiv \{a: a \in A \mbox{ or } a \in B \mbox{ or both} \}. \end{equation*}
The difference of $A$ and $B$, $A \setminus B$, pronounced "$A$ minus $B$," is the set of all elements of $A$ that are not elements of $B$: \begin{equation*} A \setminus B \equiv \{a \in A : a \not \in B \} = A \cap B^c. \end{equation*}
Intervals are special subsets of ${\mathbb R}$: \begin{eqnarray*} [a, b] &\equiv& \{x \in {\mathbb R} : a \le x \le b\}\cr (a, b] &\equiv& \{x \in {\mathbb R} : a < x \le b\}\cr [a, b) &\equiv& \{x \in {\mathbb R} : a \le x < b\}\cr (a, b) &\equiv& \{x \in {\mathbb R} : a < x < b\}. \end{eqnarray*}
Sometimes we have a collection of sets, indexed by elements of another set: $\{A_\beta : \beta \in B \}$. Then $B$ is called an index set. If $B$ is a subset of the integers ${\mathbb Z}$, usually we write $A_i$ or $A_j$, etc., rather than $A_\beta$. If $B = {\mathbb N}$, we usually write $\{A_j\}_{j=1}^\infty$ rather than $\{A_\beta : \beta \in {\mathbb N} \}$. \begin{equation*} \bigcap_{\beta \in B} A_\beta \equiv \{a: a \in A_\beta \;\;\forall \beta \in B \}. \end{equation*} ($\forall$ means "for all.") If $B = \{1, 2, \ldots, n\}$, we usually write $\bigcap_{j=1}^n A_j$ rather than $\bigcap_{j \in \{1, 2, \ldots, n\}} A_j$. The notation $\bigcup_{\beta \in B} A_\beta$ and $\bigcup_{j=1}^n A_j$ are defined analogously.
A collection of sets $\{A_\beta : \beta \in B \}$ is pairwise disjoint if $A_\beta \cap A_{\beta'} = \emptyset$ whenever $\beta \ne \beta'$. The collection $\{A_\beta : \beta \in B\}$ exhausts or covers the set $A$ if $A \subset \bigcup_{\beta \in B} A_\beta$. The collection $\{A_\beta : \beta \in B \}$ is a partition of the set $A$ if $A = \cup_{\beta \in B} A_\beta$ and the sets $\{A_\beta : \beta \in B \}$ are pairwise disjoint. If $\{A_\beta : \beta \in B \}$ are pairwise disjoint and exhaust $A$, then $\{A_\beta \cap A : \beta \in B \}$ is a partition of $A$.
A set is countable if its elements can be put in one-to-one correspondence with a subset of ${\mathbb N}$. A set is finite if its elements can be put in one-to-one correspondence with $\{1, 2, \ldots, n\}$ for some $n \in {\mathbb N}$. If a set is not finite, it is infinite. ${\mathbb N}$, ${\mathbb Z}$, and ${\mathbb Q}$ are infinite but countable; ${\mathbb R}$ is infinite and uncountable.
The notation $\# A$, pronounced "the cardinality of $A$" is the size of the set $A$. If $A$ is finite, $\# A$ is the number of elements in $A$. If $A$ is not finite but $A$ is countable (if its elements can be put in one-to-one correspondence with the elements of ${\mathbb N}$), then $\# A = \aleph_0$ (aleph-null).
The power set of a set $A$ is the set of all subsets of the set $A$. For example, the power set of $\{a, b, c\}$ is \begin{equation*} \{ \emptyset, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\}, \{b, c\}, \{a, b, c\} \}. \end{equation*} If $A$ is a finite set, the cardinality of the power set of $A$ is $2^{\# A}$. This can be seen as follows: suppose $\# A = n$ is finite. Consider the elements of $A$ to be written in some canonical order. We can specify an element of the power set by an $n$-digit binary number. The first digit is 1 if the first element of $A$ is in the subset, and 0 otherwise. The second digit is 1 if the second element of $A$ is in the subset, and 0 otherwise, etc. There are $2^n$ $n$-digit binary numbers, so there are $2^n$ subsets. The cardinality of the power set of ${\mathbb N}$ is not $\aleph_0$.
If $A$ is a finite set, $B$ is a countable set and $\{A_j : \beta \in B \}$ is a partition of $A$, then \begin{equation*} \# A = \sum_{\beta \in B} \# A_\beta. \end{equation*}
A multiset or bag is like a set, except it can contain more than one "copy" of a given element.
For instance, the sets $\{0, 1\}$ and $\{0, 1, 1\}$ are the same, because they contain the same two elements, namely, 0 and 1. But viewed as multisets, they are different, because the second contains the element 1 twice. One can think of a multiset as a set of ordered pairs, the first element of which is one of the distinct elements of the multiset, and the second of which is the multiplicity of that element. For instance, the multiset $\{0, 1, 1\}$ could be represented $ \{ (0, 1), (1, 2) \}$. We will sometimes use $\llcorner \cdot \lrcorner$ to denote a bag.
Two bags are equal if they contain the same elements with the same multiplicities, that is, set of distinct elements of the first multiset is equal to the set of distinct elements of the second, and the multiplicity function of the first is equal to the multiplicity function of the second (i.e., the multiplicity of each item is the same in both multisets). For instance, the multisets $\llcorner 0, 1, 1\lrcorner$ and $\llcorner 1, 0, 1 \lrcorner$ are equal.
A set amounts to a multiset in which the multiplicity of every element is 1.
Generally I will not make a notational distinction between sets and multisets (I use curly braces to denote both sets and multisets).
If $(x_j)_{j=1}^n$ is an ordered n-tuple, $\{x_j\}_{j=1}^n$ and $\llcorner x_j \lrcorner_{j=1}^n$ will denote the multiset containing all $n$ of its components.
The notation $(s_1, s_2, \ldots, s_n) \equiv (s_j)_{j=1}^n$ denotes an ordered $n$-tuple consisting of $s_1$ in the first position, $s_2$ in the second position, etc. The parentheses are used instead of curly braces to distinguish $n$-tuples from sets: $(s_j)_{j=1}^n \ne \{s_j\}_{j=1}^n$. The $k$th component of the $n$-tuple $s = (s_j)_{j=1}^n$, is $s_k$, $k = 1, 2, \ldots, n$. Two $n$-tuples are equal if their components are equal. That is, $(s_j)_{j=1}^n = (t_j)_{j=1}^n$ means that $s_j = t_j$ for $j = 1, \ldots, n$. In particular, $(s, t) \ne (t, s)$ unless $s=t$. In contrast, $\{s, t \} = \{ t, s \}$ always.
The Cartesian product of $S$ and $T$ is $S \times T \equiv \{(s, t): s \in S \mbox{ and } t \in T\}$. Unless $S = T$, $S \times T \ne T \times S$. ${\mathbb R}^n$ is the Cartesian product of ${\mathbb R}$ with itself, $n$ times; its elements are $n$-tuples of real numbers. If $s$ is the $n$-tuple $(s_1, s_2, \ldots, s_n) = (s_j)_{j=1}^n$,
Let $A$ be a finite set with $\# A = n$. A permutation of (the elements of) $A$ is an element $s$ of $\times_{j=1}^n A = A^n$ whose components are distinct elements of $A$. That is, $s = (s_j)_{j=1}^n \in A^n$ is a permutation of $A$ if $\# \{s_j\}_{j=1}^n = n$. There are $n! = n(n-1)\cdots 1$ permutations of a set with $n$ elements: there are $n$ choices for the first component of the permutation, $n-1$ choices for the second (whatever the first might be), $n-2$ for the third (whatever the first two might be), etc. This is an illustration of the fundamental rule of counting: in a sequence of $n$ choices, if there are $m_1$ possibilites for the first choice, $m_2$ possibilities for the second choice (no matter which was chosen in the first place), $m_3$ possibilities for the third choice (no matter which were chosen in the first two places), and so on, then there are $m_1 m_2 \cdots m_n = \prod_{j=1}^n m_j$ possible sequences of choices in all.
The number of permutations of $n$ things taken $k$ at a time, ${}_nP_k$, is the number of ways there are of selecting $k$ of $n$ things, then permuting those $k$ things. There are $n$ choices for the object that will be in the first place in the permutation, $n-1$ for the second place (regardless of which is first), etc., and $n-k+1$ choices for the item that will be in the $k$th place. By the fundamental rule of counting, it follows that ${}_nP_k = n(n-1)\cdots(n-k+1) = n!/(n-k)!$.
The number of subsets of size $k$ that can be formed from $n$ objects is \begin{equation*} {}_nC_k = {{n}\choose{k}} = {}_nP_k/k! = n(n-1)\cdots(n-k+1)/k! = \frac{n!}{k!(n-k)!}. \end{equation*} Because the power set of a set with $n$ elements can be partitioned as \begin{equation*} \cup_{k=0}^n \left \{ \mbox{all subsets of size } k \right \}, \end{equation*} it follows that \begin{equation*} \sum_{j=0}^n {}_nC_k = 2^n. \end{equation*}
Functions are subsets of Cartesian products. We write $f: \mathcal{X} \rightarrow \mathcal{Y}$, pronounced "$f$ maps $\mathcal{X}$ into $\mathcal{Y}$" or "$f$ is a function with domain $\mathcal{X}$ and co-domain $\mathcal{Y}$" if $f \subset \mathcal{X} \times \mathcal{Y}$ such that for each $x \in \mathcal{X}$, $\exists 1 y \in \mathcal{Y}$ such that $(x, y) \in f$. (The notation $\exists 1 y$ means that there exists exactly one value of $y$.) The set $\mathcal{X}$ is called the domain of $f$ and $\mathcal{Y}$ is called the co-domain of $f$. If the $\mathcal{X}$-component of an element of $f$ is $x$, we denote the $\mathcal{Y}$-component of that element of $f$ by $fx$ or $f(x)$, so that $(x, fx) \in f$; we write $f: x \mapsto y=f(x)$. The functions $f$ and $g$ are equal if they are the same subset of $\mathcal{X} \times \mathcal{Y}$, which means that they have the same domain $\mathcal{X}$, and $fx = gx$ $\forall x \in \mathcal{X}$.
Let $A \subset \mathcal{X}$. The image of $A$ under $f$ is \begin{equation*} fA = f(A) \equiv \{ y \in \mathcal{Y} : (x, y) \in f \mbox{ for some } x \in A\}. \end{equation*} More colloquially, we would write this as \begin{equation*} fA = \{ y \in \mathcal{Y} : f(x) = y \mbox{ for some } x \in A \}. \end{equation*} If $f\mathcal{X}$ is a proper subset of $\mathcal{Y}$, $f$ is into. If $f\mathcal{X} = \mathcal{Y}$, $f$ is onto. For $B \subset \mathcal{Y}$, the inverse image of $B$ under $f$ or pre-image of $B$ under $f$ is \begin{equation*} f^{-1}B \equiv \{ x \in \mathcal{X} : fx \in B \}. \end{equation*} Similarly, $f^{-1}y \equiv \{ x \in \mathcal{X} : fx = y \}$ If $\forall y \in \mathcal{Y}$, $\# \{ f^{-1} y \} \le 1$, $f$ is one-to-one (1:1). If $f$ is one-to-one and onto, i.e., if $\forall y \in \mathcal{Y}$, $\# \{ f^{-1}y \} = 1$, $f$ is a bijection.
A relation $R$ on the set $A$ is a collection of ordered pairs of elements of $A$, that is, $R \subset A \times A$. Notationally, if $(a,b) \in R $, we write $a R b$.
A relation can also be thought of as a function from $A \times A$ to $\{0, 1\}$ such that the function takes the value 1 if an only if (iff) $(a,b) \in R $.
For instance, consider the "divides" relation on the positive integers $\mathbb N$. We say that $j \in \mathbb N$ "divides" $k \in \mathbb N$ if $k$ is an integer multiple of $j$. As an example, 3 divides 102. If we let $R$ denote the relation "divides," then $(3, 102) \in R $ and we would write $3 R 102$
A partial order $\le $ on the set $A$ is a relation on $A$ such that:
If in addition, for every $ (a, b) \in A \times A $ either $ a \le b $ or $ b \le a $ (comparability), then $ \le $ is a total order.
An equivalence relation $ \sim $ on the set $A$ is a relation on $A$ such that:
The equivalence class of $a$ under $\sim$ is $[a] \equiv \{b \in A : b \sim a \}$.
A group is an ordered pair $(\mathcal{G}, \times)$, where $\mathcal{G}$ is a collection of objects (the elements of the group) and $\times$ is a mapping from $\mathcal{G} \times \mathcal{G}$ onto $\mathcal{G}$, \begin{eqnarray*} \times : & \mathcal{G} \times \mathcal{G} & \rightarrow \mathcal{G} \\ & (g, h) & \mapsto g \times h, \end{eqnarray*} satisfying the following axioms:
If, in addition, for every $g, h \in \mathcal{G}$, $g \times h = h \times g$ (if the group operation commutes), we say that $(\mathcal{G}, \times)$ is an Abelian group or commutative group.
In an abuse of terminology, we will often call $\mathcal{G}$ a group, with the group operation $\times$ understood from context.
Examples of groups include the real numbers together with ordinary addition, $({\mathbf R}, +)$; the real numbers other than zero together with ordinary multiplication, $({\mathbf R} \setminus \{0\}, *)$; the rational numbers together with ordinary addition, $({\mathbf Q}, +)$; and the integers 0 to $p-1$, $p$ prime, together with addition modulo $p$, $( \{0, 1, \ldots, p-1\}, +)$; and the set of all permutations of $n$ distinct objects (called the symmetric group of degree $n$, denoted $\mathcal{S}_n$).
It is common to omit the symbol $\times$ from the group notation, and write $g \times h$ as $gh$, when $\mathcal{G}$ is an abstract group.
Suppose $\mathcal{H} \subset \mathcal{G}$, where $(\mathcal{G}, \times)$ is a group.
If $(\mathcal{H}, \times)$ is also a group, it is called a subgroup of $(\mathcal{G}, \times)$.
Consider a subset $A$ of elements of a group $\mathcal{G}$. The subgroup generated by $A$ is the smallest group that contains $A$ (using the same multiplication operator as $\mathcal{G}$), that is, every element of $\mathcal{G}$ that can be written as a product of elements of $A$ and inverses of elements of $A$. If the subgroup generated by $A$ is equal to $\mathcal{G}$, then $A$ is a generating set of $\mathcal{G}$.
For example, consider $\mathcal{S}_n$, the symmetric group of permutations of $n$ objects. This group can be generated from just two of its elements, a transposition $\pi = (2, 1, 3, \ldots, n)$ and a circular shift $\eta = (n, 1, 2, 3, \ldots, n-1)$.
An ordered triple $(\mathcal{F}, \times, +)$ is a field if $\mathcal{F}$ is a collection of objects and $\times$ and $+$ are operations on $\mathcal{F} \times \mathcal{F}$ such that
$f \times (g+h) = f \times g + f \times h$ and $(f+g) \times h = f\times h + g \times h$.
The additive inverse of $g$ is denoted $-g$; the multiplicative inverse of $g$ is $g^{-1} = 1/g$.
Examples: $({\mathbf R}, \times, +)$, where $\times$ is ordinary (real) multiplication and $+$ is ordinary (real) addition. The complex numbers ${\mathbf C}$, with complex multiplication and addition.
These (and the extended reals) are the only fields we will use.
We shall use the following conventions:
With these conventions, $([-\infty, \infty], \times, +)$ is a field.
Let $ \mathcal{G} $ be a group (with the group operation implicit) with identity element $e \in \mathcal{G} $ and let $\mathcal X$ be a set. Then a group action of $\mathcal G$ on $\mathcal X$ is a function from $ {\mathcal G} \times {\mathcal X}$ into $\mathcal X$ such that (if we denote the image of $(g, x)$ under this transformation as $g.x$),
$g.(h.x) = (gh).x$ (the group operation is compatible with the composition of transformations on the set).
Then we call $\mathcal{G}$ a group of transformations of $\mathcal{X}$.
The orbit of the point $ x \in \mathcal{X} $ under $\mathcal{G}$ is the set $\mathcal{G}.x \equiv \{ g.x : g \in \mathcal{G} \}$.
If we write $ x \sim y $ when $ y \in \mathcal{G}.x $, then $\sim$ is an equivalence relation on $\mathcal{G}$; that is, orbits define equivalence classes. Moreover, the set of distinct orbits partitions a set into equivalence classes.
For example, let $\mathcal{X} $ be 2-dimensional Euclidean space, and let $\mathcal{G} $ be rotations around the origin. Such rotations comprise a group, if the group operation is defined as composition: if $g$ is rotation by the angle $\theta$ and $h$ is rotation by the angle $\omega$, then $gh$ is defined to be rotation by the angle $\theta + \omega$, equivalently, rotation by $\theta$ followed by rotation by $\omega$. The identity element is rotation by 0. The orbit of the point $ x = (x_1, x_2)$ is a circle centered at the origin, with radius $r = \sqrt{x_1^2 + x_2^2}$. The entire Euclidean plane can be partitioned into an uncountable union of circles of different radii.
As another example, let $\mathcal{X}$ be the set of ordered $n$-tuples of real numbers, elements of $\mathbb{R}^n$. Let $\pi$ denote a permutation of $\{1, \ldots, n\}$, an $n$-tuple of integers between 1 and $n$ with no duplicates: $\pi_j$ is a number between 1 and $n$ and $\{\pi_1, \pi_2, \ldots, \pi_n \} = \{1, 2, \ldots, n\}$. If $\pi$ and $\phi$ are two permutations, define $\pi \phi$ to be the permutation that results from composing the two permutations. For instance, if $\pi = (1,3,2,4)$ and $\phi = (2,4,3,1)$, then $\pi \phi = (2,3,4,1)$. Such permutations comprise a group with identity element $e = (1,2,3,4)$. If we define the action of a permutation on an element $ x \in \mathcal{X}$ to be an $n$-tuple that results from re-ordering the components of $x$ according to $\pi$, then this group For instance, if $x = (x_1, x_2, \ldots, x_n) $ and $ \pi = (\pi_1, \ldots, \pi_n)$ then $ \pi x = (x_{\pi_1}, \ldots x_{\pi_n})$. Even more concretely, let $x = (x_1, x_2, x_2, x_4) $ and let $ \pi = (1, 4, 3, 2)$. Then $ \pi x = (x_1, x_4, x_3, x_2)$. The orbit of a point $x$ is the collection of all elements of $\mathcal{X}$ with the same components as $x$, but in all $n!$ possible orders. (The orbit will contain fewer than $n!$ distinct elements of $\mathcal{X}$ if the values of the components of $x$ are not all distinct.)
Let $\Omega$ be a set and $\tau$ be a collection of subsets of $\Omega$ (the open sets) such that:
Then $\tau$ is a topology on $\Omega$, and $(\Omega, \tau)$ is a topological space.
Let $\Omega$ be a set and $\mathcal{F}$ be a collection of subsets of $\Omega$. $(\Omega, \mathcal{F})$ is an algebra of sets over $\Omega$ if
$(\Omega, \mathcal{F})$ is a sigma-algebra of sets over $\Omega$ ($\mathcal{F}$ is a sigma-algebra) if, in addition, whenever $A_j \in \mathcal{F}$ for all $j \in \mathbb{N}$, $\cup_{j\in \mathbb{N}} A_j \in \mathcal{F}$ (equivalently, $\cap_{j\in \mathbb{N}} A_j \in \mathcal{F}$), i.e., $\mathcal{F}$ is closed under countable unions or countable intersections.
Example. The Borel sigma-algebra on $\Re$ is the smallest sigma-algebra that contains all intervals in $\Re$.
Let $\mathcal{B}$ be a collection of subsets of $\Omega$. The sigma algebra generated by $\mathcal{B}$, $\sigma(\mathcal{B}$, is the smallest sigma-algebra that contains every element of $\mathcal{B}$.
Suppose $f$ is a real-valued function on the set $\Omega$. The sigma-algebra generated by $f$, $\sigma(f)$ is the smallest sigma-algebra on $\Omega$ that contains the pre-image of every Borel subset of $\Re$.
Let $f$ be a function from a measurable space $(\Omega, \mathcal{F})$ to a topological space $\mathcal{X}$ with Borel sigma-algebra $\mathcal{B}$. If $\{ \omega \in \Omega : f(\omega) \in B \} \in \mathcal{F}$ for every $B \in \mathcal{B}$, then $f$ is a measurable function. That is, $f$ is measurable if the pre-image of every Borel set is a measurable set.
A (real-valued) random variable $X$ is a measurable function from a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ to $\Re$.
TBD
Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space. Let $\mathcal{I}$ be a totally ordered set with order relation $\le$. Suppose that for all $i \in \mathcal{I}$, $\mathcal{F}_i$ is a sub-sigma-algebra of $\mathcal{F}$, and that if $i < j$, $\mathcal{F}_i \subset \mathcal{F}_j$. Then $\mathbb{F} := \{\mathcal{F}_i\}_{i \in \mathcal{I}}$ is a filtration and $(\Omega, \mathcal{F}, \mathbb{F}, \mathbb{P})$ is a filtered probability space.
Filtrations arise naturally in studying stochastic processes. Let $\sigma(X)$ be the sigma-algebra generated by the random variable $X$ (the smallest sigma-algebra for which $X$ is measurable, i.e., the smallest sigma algebra that contains the pre-image $X^{-1}(B)$ of every Borel subset $B \subset \mathcal{B}$), and let $\sigma(X_j : j \le i) := \sigma(\cup_{j \le i} \sigma(X_j))$. As the process evolves, a richer and richer set of events becomes measurable. Let $(X_i)_{i \in \mathbb{N}}$ be a stochastic process on the probability space $(\Omega, \mathcal{F}, \mathbb{P})$, and define $\mathcal{F}_i := \sigma(X_j : j \le i)$. Then $\mathbb{F} := \{\mathcal{F}_i\}_{i \in \mathcal{I}}$ is a filtration.
Let $\mathbb{P}$ be a probability distribution on a (measurable) set $\mathcal{X}$ and let $\mathcal{G}$ be a group of transformations of $\mathcal{X}$ with the property that for every $ A \subset \mathcal{X}$ for which $\mathbb{P}(A)$ is defined and every $ g \in \mathbb{G} $, $\mathbb{P}(g.A)$ is also defined (i.e., $\mathcal{G}$ preserves the measurability of sets).
A probability distribution $\mathbb{P}$ on a (measurable) set $\mathcal{X}$ is invariant under the group of transformations $\mathcal{G}$ if for every $ A \subset \mathcal{X}$ for which $\mathbb{P}(A)$ is defined and every $ g \in \mathcal{G}$, $\mathbb{P}(g.A)$ is also defined, and $\mathbb{P}(A) = \mathbb{P}(g.A)$.
For example, the multivariate normal distribution with IID components is invariant with respect to rotations about its mean.
A collection of random variables is exchangeable if every permutation of the variables has the same joint probability distribution, that is, if the joint distribution is invariant under the permutation group. Any collection of random variables that are independent and identically distributed (IID) is exchangeable, but IID is a stronger property than exchangeability.
Let $\mathcal{X}$ be a set (the "vectors") with two relations, $+$ (vector addition) and $\cdot$ (scalar multiplication), such that for all $x, y \in \mathcal{X}$ and all $\alpha, \beta \in \mathbb{R}$,
Then $(\mathcal{X}, +, \cdot)$ is a real linear vector space.
One generally omits the $\cdot$ notation and just writes $\alpha x$ rather than $\alpha \cdot x$.
A set of vectors $S \subset \mathcal{X}$ of a linear vector space $\mathcal{X}$ is linearly independent if for every finite subset $\{s_1, \ldots, s_n \} \subset S$, and every set of scalars $\{\alpha_1, \ldots, \alpha_n\}$ if $\sum_{j=1}^n \alpha_js_j = 0$, then $\alpha_1 = \cdots = \alpha_n = 0$.
An algebraic basis $B \subset \mathcal{X}$ for a linear vector space $\mathcal{X}$ is a linearly independent collection of vectors such that for any $x \in \mathcal{X}$, there exists a finite subset $\{b_j\}_{j=1}^n$ and a set of scalars $\{\alpha_j \}_{j=1}^n$ such that $ x = \sum_{j=1}^n \alpha_j b_j$.
Let $\mathcal{X}$ be a real linear vector space and let $\langle \cdot, \cdot \rangle : \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ such that for all $x, y, z \in \mathcal{X}$ and all $\alpha, \beta \in \mathbb{R}$,
Then $(\mathcal{X}, \langle, \rangle)$ is a real inner product space.
Let $\mathcal{X}$ be a real linear vector space and let $\| \cdot \|: \mathcal{X} \to \mathbb{R}^+$ such that for all $x, y, x \in \mathcal{X}$ and all $\alpha \in \mathbb{R}$,
Then $\| \cdot \|$ is a norm on $\mathcal{X}$, and we call $\mathcal{X}$ a normed vector space.
Suppose $\mathcal{X}$ is a normed vector space with norm $\| \cdot \|$, and let $x_1, x_2, \ldots $ be a sequence of elements of $\mathcal{X}$. \begin{equation*} \| x_{j+1} - x_j \| \rightarrow 0 \end{equation*} as $j \rightarrow \infty$, then $x_1, x_2, \ldots $ is a Cauchy sequence.
A normed vector space is complete if every Cauchy sequence in $\mathcal{X}$ has a limit in $\mathcal{X}$. That is, if $x_1, x_2, \ldots $ is a Cauchy sequence, then there exists $x \in \mathcal{X}$ such that \begin{equation*} \lim_{j \rightarrow \infty} \| x_j - x \| = 0. \end{equation*}
In a normed vector space $\mathcal{X}$, the closed ball of radius $\epsilon > 0$ centered at $x \in \mathcal{X}$ is
\begin{equation*} B_\epsilon [x_0] \equiv \{ x \in \mathcal{X}: \|x - x_0\| \le \epsilon \}.\end{equation*}The the open ball of radius $\epsilon > 0$ centered at $x \in \mathcal{X}$ is
\begin{equation*} B_\epsilon(x_0) \equiv \{ x \in \mathcal{X}: \|x - x_0\| < \epsilon \}.\end{equation*}The (open) unit ball in $\mathcal{X}$ is $B_1[0]$ ($B_1(0)$).
A Hilbert space is an inner product space $(\mathcal{X}, \langle , \rangle)$, endowed with the norm $\|x \| \equiv \sqrt{\langle x, x \rangle}$, that is complete with respect to that norm.
In a Hilbert space, every bounded linear functional can be represented as the inner product with some element of the space.
That is, suppose $L$ is a bounded linear functional on a Hilbert space $\mathcal{H}$: $L(\alpha x + \beta y) = \alpha L(x) + \beta L(y)$ for all $x, y \in \mathcal{H}$ and all scalars $\alpha$ and $\beta$, and there exists a real number $C \ge 0$ such that for all $x \in \mathcal{H}$, $|L(x)| \le C \|x\|$.
Then there exists $\ell \in \mathcal{H}$ such that for all $x \in \mathcal{H}$, $L(x) = \langle \ell, x \rangle$.