Notebook

Sets, Combinatorics, & Probability¶

Probability versus Statistics¶

In probability, we assume we know what the world is like, and predict what we would then see.

In statistics, we make observations and use them to learn something about the world. That generally requires a mathematical model for how the obervations were generated, to relate the observations to properties of the world. Sometimes, that model arises naturally from knowledge of the process that generated the data, for instance, a randomized experiment or a physical measurement where the underlying physics is understood. In other cases, the model is basically an assumption. When the model does not have a firm basis in actual knowledge, inferences are suspect. As Robert L. Parker says, "the more you assume, the less you know."

Probability is like a forward problem in science or engineering: the the physics, including any parameters, is known. The issue is to predict what will be observed, up to instrumental error and other noise.

Statistics, at least inferential statistics, is like an inverse problem. The physics is known but there are unknown parameters. The issue is to use the (noisy) data to learn about parameters of the system.

Probability is a combination of mathematics and philosophy. The philosophy connects the mathematics to the world.

We will start with the mathematics.

Sets¶

The mathematics of probability is expressed most naturally using set theory. We will review the basic terminology and reviews naive set theory: how to define and manipulate sets, operations on sets that yield other sets, special relationships among sets, and so on.

A set is a collection of objects, called members or elements of the set, without regard for their order. $a \in A$, pronounced "$a$ is an element of $A$," "$a$ is in $A$," or "$a$ is a member of $A$" means that $a$ is an element of the set $A$. This is the same as writing $A \ni a$, which is pronounced "$A$ contains $a$." If $a$ is not an element of $A$, we write $a \not \in A$. Sets may be described explicitly by listing their contents, or implicitly by specifying a property that all elements of the set share, or a condition that they satisfy. The contents of sets are enclosed in curly braces: $\{ \}$.

Here are some examples:

$A = \{a, b, c, d \}$: the set containing the four elements $a$, $b$, $c$, and $d$.
$\emptyset = \{ \}$: the empty set, the set that contains no elements.
${\mathbf Z} \equiv \{ \ldots, -2, -1, 0, 1, 2, \ldots \}$: the integers.
${\mathbf N} \equiv \{1, 2, 3, \ldots \}$: the natural (counting) numbers.
$\Re \equiv (-\infty, \infty)$: the real numbers.
$\Re^+ \equiv [-\infty, \infty]$: the extended real numbers.
${\mathbf C} \equiv \{ a + bi: a, b \in \Re \}$, where $i = \sqrt{-1}$: the complex numbers.
${\mathbf Q} \equiv \{ a/b: a, b \in {\mathbf Z}, b \ne 0\}$: the rational numbers.

$B$ is a subset of $A$, written $B \subset A$ or $A \supset B$, if every element of the set $B$ is also an element of the set $A$. Thus ${\mathbf N} \subset {\mathbf Z} \subset {\mathbf Q} \subset \Re \subset {\mathbf C}$. The empty set $\emptyset$ is a subset of every set. If $A \subset B$ and $B \subset A$, $A$ and $B$ are the same set, and we write $A = B$. If $B$ is not a subset of $A$, we write $B \not \subset A$ or $A \not \supset B$. $B$ is a proper subset of $A$ if $B \subset A$ but $A \not \subset B$.

The complement of $A$ (with respect to the universe ${\mathcal X}$), written $A^c$ or $A'$, is the set of all objects under consideration (${\mathcal X}$) that are not elements of $A$. That is, $A^c \equiv \{ a \in {\mathcal X} : a \not \in A\}$.

The intersection of $A$ and $B$, written $A \cap B$ or $AB$, is the set of all objects that are elements of both $A$ and $B$: $$ A \cap B \equiv \{a: a \in A \mbox{ and } a \in B \}. $$ If $A \cap B = \emptyset$, we say $A$ and $B$ are disjoint or mutually exclusive.

The union of $A$ and $B$, written $A \cup B$, is the set of all objects that are elements of $A$ or of $B$ (or both): $$ A \cup B \equiv \{a: a \in A \mbox{ or } a \in B \mbox{ or both} \}. $$

The difference or set difference of $A$ and $B$, $A \setminus B$, pronounced "$A$ minus $B$," is the set of all elements of $A$ that are not elements of $B$: $$ A \setminus B \equiv \{a \in A : a \not \in B \} = A \cap B^c. $$

Intervals are special subsets of ${\mathbf R}$: $$ [a, b] \equiv \{x \in \Re : a \le x \le b\} $$ $$ (a, b] \equiv \{x \in \Re : a < x \le b\} $$ $$ [a, b) \equiv \{x \in \Re : a \le x < b\} $$ $$ (a, b) \equiv \{x \in \Re : a < x < b\}. $$ Sometimes we have a collection of sets, indexed by elements of another set: $\{A_\beta : \beta \in B \}$. Then $B$ is called an index set. If $B$ is a subset of the integers ${\mathbf Z}$, usually we write $A_i$ or $A_j$, etc., rather than $A_\beta$. If $B = {\mathbf N}$, we usually write $\{A_j\}_{j=1}^\infty$ rather than $\{A_\beta : \beta \in {\mathbf N} \}$. $$ \bigcap_{\beta \in B} A_\beta \equiv \{a: a \in A_\beta \;\;\forall \beta \in B \}. $$ ($\forall$ means "for all.") If $B = \{1, 2, \ldots, n\}$, we usually write $\bigcap_{j=1}^n A_j$ rather than $\bigcap_{j \in \{1, 2, \ldots, n\}} A_j$. The notation $\bigcup_{\beta \in B} A_\beta$ and $\bigcup_{j=1}^n A_j$ are defined analogously.

A collection of sets $\{A_\beta : \beta \in B \}$ is pairwise disjoint if $A_\beta \cap A_{\beta'} = \emptyset$ whenever $\beta \ne \beta'$. The collection $\{A_\beta : \beta \in B\}$ exhausts or covers the set $A$ if $A \subset \bigcup_{\beta \in B} A_\beta$. The collection $\{A_\beta : \beta \in B \}$ is a partition of the set $A$ if $A = \cup_{\beta \in B} A_\beta$ and the sets $\{A_\beta : \beta \in B \}$ are pairwise disjoint. If $\{A_\beta : \beta \in B \}$ are pairwise disjoint and exhaust $A$, then $\{A_\beta \cap A : \beta \in B \}$ is a partition of $A$.

A set is countable if its elements can be put in one-to-one correspondence with a subset of ${\mathbf N}$. A set is finite if its elements can be put in one-to-one correspondence with $\{1, 2, \ldots, n\}$ for some $n \in {\mathbf N}$. If a set is not finite, it is infinite. ${\mathbf N}$, ${\mathbf Z}$, and ${\mathbf Q}$ are infinite and countable (countably infinite); ${\mathbf R}$ is infinite and uncountable.

The notation $\# A$, pronounced "the cardinality of $A$" is the size of the set $A$, suitably defined. If $A$ is finite, $\# A$ is the number of elements in $A$. If $A$ is not finite but $A$ is countable (if its elements can be put in one-to-one correspondence with the elements of ${\mathbf N}$), then $\# A = \aleph_0$ (aleph-null). If the elements of $A$ can be put in one-to-one correspondence with the elements of $\mathbf{R}$, then $\#A = c$, the cardinality of the continuum.

The power set of a set $A$, denoted $2^A$, is the set of all subsets of the set $A$. For example, the power set of $\{a, b, c\}$ is $$ \{ \emptyset, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\}, \{b, c\}, \{a, b, c\} \}. $$ If $A$ is a finite set, the cardinality of the power set of $A$ is $2^{\# A}$. This can be seen as follows: suppose $\# A = n$ is finite. Consider the elements of $A$ to be written in some canonical order. We can specify an element of the power set by an $n$-digit binary number. The first digit is 1 if the first element of $A$ is in the subset, and 0 otherwise. The second digit is 1 if the second element of $A$ is in the subset, and 0 otherwise, etc. There are $2^n$ $n$-digit binary numbers, so there are $2^n$ subsets. The cardinality of the power set of ${\mathbf N}$ is not $\aleph_0$.

If $A$ is a finite set, $B$ is a countable set and $\{A_j : \beta \in B \}$ is a partition of $A$, then $$ \# A = \sum_{\beta \in B} \# A_\beta. $$

Suppose $A = A_1 \cup A_2$, but that possibly $A_1 A_2 \ne \emptyset$, so $\{A_1, A_2\}$ might not be a partition of $A$, because $A_1$ and $A_2$ are not necessarily disjoint. Then still $$\#A = \#A_1 + \#A_2 - \#A_1A_2.$$

This is seen most easily using a Venn diagram, and can be proved by constructing a partition of $A$, $A = A_1A_2^c \cup A_1^cA_2 \cup A_1A_2$, and noting that $\#A_1 + \#A_2 = \#A_1A_2^c + \#A_1^cA_2 + 2\#A_1A_2$.

If $A = A_1 \cup A_2 \cup A_3$ but $\{A_1, A_2, A_3\}$ are not necessarily disjoint, then still

$$\#A = \#A_1 + \#A_2 + \#A_3 - \#A_1A_2 - \#A_1A_3 - \#A_2A_3 + \#A_1A_2A_3.$$

More generally, if $A = \cup_{i=1}^n A_i$, then the Inclusion-Exclusion Principle holds:

$$ \#A = \sum_{i=1}^n \#A_i - \sum_{1 \le i_1 < i_2 \le n} \#(A_{i_1}A_{i_2}) + \sum_{1 \le i_1 < i_2 < i_3 \le n} \#(A_{i_1}A_{i_2}A_{i_3}) - \cdots +(-1)^{k-1} \sum_{1 \le i_1 < i_2 < \cdots < i_k \le n} \# (A_{i_1}A_{i_2} \cdots A_{i_k}) + \cdots $$

Some useful set relations¶

If $A\subset B$ and $B\subset A$ then $A = B$. (If two sets are subsets of each other, they are the same set.)
$\emptyset \subset A$. (The empty set is a subset of every set.)
$\emptyset \cup A = A$.

(The union of the empty set and any other set is that set.)

$\emptyset \cap A = \emptyset$.

(The intersection of the empty set and any other set is empty.)

If $A \subset B$ and $B \subset C$ then $A \subset C$. (The subset relation is transitive.)
If $A \subset B$ then $B^c \subset A^c$.

(Complementation reverses the subset relation.)

$A \cap B \subset A$. Moreover, $A\cap B = A$ if and only if $A \subset B$.
$A \subset A\cup B$. Moreover, $A = A\cup B$ if and only if $B \subset A$.
$(A\cap B)^c = A^c \cup B^c$. (de Morgan)
$(A\cup B)^c = A^c\cap B^c$. (de Morgan)
$A \cap B = B \cap A$. (Intersection is commutative.)
$A\cup B = B\cup A$. (Union is commutative.)
$A\cap (B\cap C) = (A\cap B)\cap C$. (Intersection is associative.)
$A\cup (B\cup C) = (A\cup B)\cup C$. (Union is associative.)
$A\cap (B\cup C) = (A\cap B)\cup (A\cap C)$. (Distribution of intersection over union.)
$A\cup (B\cap C) = (A\cup B)\cap (A\cup C)$. (Distribution of union over intersection.)

Exercises¶

Prove that if $\{A_\beta : \beta \in B \}$ are pairwise disjoint and exhaust $A$, then $\{A_\beta \cap A : \beta \in B \}$ is a partition of $A$.
Prove de Morgan's identity $(A\cup B)^c = A^c\cap B^c$.
Prove that $\{A_1A_2^c, A_1^cA_2, A_1A_2\}$ is a partition of $A_1 \cup A_2$.

In [1]:

# boilerplate
%matplotlib inline
from __future__ import division
import math
import numpy as np
import scipy as sp
from scipy import special
import matplotlib.pyplot as plt

In [33]:

# Sets in Python
# Make three sets, A = {1, 2, 3}, B = {"a", "b", "green", 3}, C = {1, 2}
A = [1, 2, 3, 3] # this is a list but not a set, because 3 is listed twice
print 'A: ', A
A = set(A)  # this is a set containing the unique elements of A
print 'A as a set (duplicates removed):',A
B = set(["a", "b", "green", 3])
print 'B:', B
C = set(range(1,3)) # a different construction
print 'C:', C
#
# Set membership
print '1 is in A:', 1 in A  # is 1 an element of A?
print '"green" is in A:',"green" in A
print '"green" is in B:',"green" in B
print 'A is a subset of C:', A.issubset(C)
print 'C is a subset of A:', C.issubset(A)
print 'A intersect B:', A.intersection(B)

A:  [1, 2, 3, 3]
A as a set (duplicates removed): set([1, 2, 3])
B: set(['a', 3, 'b', 'green'])
C: set([1, 2])
1 is in A: True
"green" is in A: False
"green" is in B: True
A is a subset of C: False
C is a subset of A: True
A intersect B: set([3])

Cartesian Products¶

The notation $(s_1, s_2, \ldots, s_n) \equiv (s_j)_{j=1}^n$ denotes an ordered $n$-tuple consisting of $s_1$ in the first position, $s_2$ in the second position, etc. The parentheses are used instead of curly braces to distinguish $n$-tuples from sets: $(s_j)_{j=1}^n \ne \{s_j\}_{j=1}^n$. The $k$th component of the $n$-tuple $s = (s_j)_{j=1}^n$, is $s_k$, $k = 1, 2, \ldots, n$. Two $n$-tuples are equal if their components are equal. That is, $(s_j)_{j=1}^n = (t_j)_{j=1}^n$ means that $s_j = t_j$ for $j = 1, \ldots, n$. In particular, $(s, t) \ne (t, s)$ unless $s=t$. In contrast, $\{s, t \} = \{ t, s \}$ always.

The Cartesian product of $S$ and $T$ is $S \bigotimes T \equiv \{(s, t): s \in S \mbox{ and } t \in T\}$. Unless $S = T$, $S \bigotimes T \ne T \bigotimes S$. ${\mathbf R}^n$ is the Cartesian product of ${\mathbf R}$ with itself, $n$ times; its elements are $n$-tuples of real numbers. If $s$ is the $n$-tuple $(s_1, s_2, \ldots, s_n) = (s_j)_{j=1}^n$ and $t$ is the $n$-tuple $(t_1, t_2, \ldots, t_n) = (t_j)_{j=1}^n$, then $s = t$ iff $s_j = t_j$ for all $j=1, \ldots, n$.

The Python library itertools has utilities for generating Cartesian products, power sets, permutations, and the like.

In [3]:

# Cartesian products using itertools
import itertools
A = ['a','b','c','d']
B = range(3)
print [i for i in itertools.product(A,B)]

[('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2), ('c', 0), ('c', 1), ('c', 2), ('d', 0), ('d', 1), ('d', 2)]

Combinatorics¶

Let $A$ be a finite set with $\# A = n$. A permutation of (the elements of) $A$ is an element $s$ of $\bigotimes_{j=1}^n A = A^n$ whose components are distinct elements of $A$. That is, $s = (s_j)_{j=1}^n \in A^n$ is a permutation of $A$ if $\# \{s_j\}_{j=1}^n = n$. There are $n! = n(n-1)\cdots 1$ permutations of a set with $n$ elements: there are $n$ choices for the first component of the permutation, $n-1$ choices for the second (whatever the first might be), $n-2$ for the third (whatever the first two might be), etc.

This is an illustration of the Fundamental Rule of Counting: If specifying each (unique) element of a set can be expressed as a series $n$ of choices, and the number of options at step $i$ of the series is $n_i$ regardless of the choices made before step $i$, then the total number of elements of the set is $\prod_{i=1}^n n_i$.

The number of permutations of $n$ things taken $k$ at a time, ${}_nP_k$, is the number of ways there are of selecting $k$ of $n$ things, then permuting those $k$ things. There are $n$ choices for the object that will be in the first place in the permutation, $n-1$ for the second place (regardless of which is first), etc., and $n-k+1$ choices for the item that will be in the $k$th place. By the fundamental rule of counting, it follows that ${}_nP_k = n(n-1)\cdots(n-k+1) = n!/(n-k)!$.

The number of subsets of size $k$ that can be formed from $n$ objects is $$ {}_nC_k = {{n}\choose{k}} = {}_nP_k/k! = n(n-1)\cdots(n-k+1)/k! = \frac{n!}{k!(n-k)!}, $$ since each subset of size $k$ can be ordered into $k!$ permutations.

Here are some useful identities:

$ {{n}\choose{k}} = {{n}\choose{n-k}} $.
$ {{n}\choose{0}} = {{n}\choose{n}} = 1$.
$ {{n}\choose{1}} = {{n}\choose{n-1}} = n $.
$ {{n} \choose {k}} = \prod_{m=0}^{k-1} \frac{n-m}{m+1} $.
$ {{n} \choose {k}} = {{n-1}\choose{k-1}} + {{n-1} \choose {k}}$.

Because the power set of a set with $n$ elements can be partitioned as

$$ \cup_{k=0}^n \left \{ \mbox{all distinct subsets of size $k$} \right \}, $$

and since the power set has $2^n$ elements, it follows that $$ \sum_{j=0}^n {{n} \choose {j}} = 2^n. $$

### Exercise Prove the five identities about binomial coefficients given above.

In [34]:

# Combinatorics in Python

n = 5
k = 3
print 'n!:',math.factorial(n)

# number of combinations of n objects taken k at a time
print 'nCk:',sp.special.binom(n,k)

# number of permutations of n objects taken k at a time

def permu(n, k):
    permu = 1
    s = 0
    while (s < k):
        permu = permu*(n-s)
        s = s+1
    return(permu)

print 'nPk:',permu(n, k)

n!: 120
nCk: 10.0
nPk: 60

In [5]:

# enumerate combinations and permutations using itertools
A = ['a','b','c','d','e']
print 'A:', A
k = 2
print 'all combinations of', k, 'elements of A:', [i for i in itertools.combinations(A,k)]
print 'all permutations of', k, 'elements of A:', [i for i in itertools.permutations(A,k)]

A: ['a', 'b', 'c', 'd', 'e']
combinations of  2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'd'), ('c', 'e'), ('d', 'e')]
permutations of  2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'a'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'a'), ('c', 'b'), ('c', 'd'), ('c', 'e'), ('d', 'a'), ('d', 'b'), ('d', 'c'), ('d', 'e'), ('e', 'a'), ('e', 'b'), ('e', 'c'), ('e', 'd')]

Strategies for counting¶

There are several approaches to counting the elements of a set that can be helpful in practice, aside from simply enumerating the elements.

Divide and conquer: partition the set.
Use the Fundamental Rule of Counting. (This is how we found the number of permutations of $k$ of $n$ things.)
Overcount: count each item $k$ times, then divide by $k$. (This is how we derived the number of combinations of $k$ of $n$ things from the number of permutations of $k$ of $n$ things: each combination of $k$ things occurs $k!$ times among the set of permutations.)
Overinclude, then adjust: count a set that is larger than the set in question, then subtract the extras.
Use the Inclusion-Exclusion Principle.

Computational exercise¶

Write a Python function that constructs all permutations of a finite list.

The function should take as input a list s, and produce as output all permutations of the elements of the list. Items should be considered unique based on their position in the list.

For instance,

s = ['a','b','c']
permute(s)

should return the 6 permutations of a, b, and c:

a b c
a c b
b a c
b c a
c a b
c b a

Write your function from scratch; i.e., do not use any libraries that are not part of the core of Python (e.g., do not use the itertools library).

Numerical stability and order of operations¶

As a matter of mathematics, ${{n}\choose{k}} = \frac{n!}{k!(n-k)!}$, but this is a bad way to compute ${{n}\choose{k}}$. When $n$ is large, the numerator $n!$ is enormous, and can overflow finite-precision calculations even for values of $n$ and $k$ for which ${{n}\choose{k}}$ is small. Moreover, relying on cancellation in finite-precision arithmetic can lead to large errors, whether the cancellation is through division or subtraction.

So, rather than compute ${{n}\choose{k}}$ by dividing factorials, it's better to build it by multiplying ratios.

In [35]:

# illustrating issues calculating nCk as a ratio of factorials for large n
n = 1000
k = 2
print '1000!:', math.factorial(n) # Really Big Number: 
                                  # would overflow in many languages, 
                                  # including R, MATLAB, Fortran
# however, 1000 choose 2 is only (1000*999)/2 = 499500
print '\n998!:', math.factorial(n-k)
print '\n1000C2:', sp.special.binom(n, k)

# let's code this from scratch in a stable way
def myChoose(n, k):
    ''' 
    no error checking. If this were for production, we'd check that the arguments are integers
    and that n >= k >= 0.
    '''
    m = 0
    myc = 1
    while (m < k):
        myc = myc * (n-m)/(m+1)
        m = m+1
    return(myc)

print '1000C2 (2nd way):', myChoose(n, k)

1000!: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

998!: 402790050127220994538240674597601587306681545756471103647447357787726238637266286878923131618587992793273261872069265323955622495490298857759082912582527118115540044131204964883707335062250983503282788739735011132006982444941985587005283378024520811868262149587473961298417598644470253901751728741217850740576532267700213398722681144219777186300562980454804151705133780356968636433830499319610818197341194914502752560687555393768328059805942027406941465687273867068997087966263572003396240643925156715326363340141498803019187935545221092440752778256846166934103235684110346477890399179387387649332483510852680658363147783651821986351375529220618900164975188281042287183543472177292257232652561904125692525097177999332518635447000616452999984030739715318219169707323799647375797687367013258203364129482891089991376819307292252205524626349705261864003453853589870620758596211518646408335184218571196396412300835983314926628732700876798309217005024417595709904449706930796337798861753941902125964936412501007284147114260935633196107341423863071231385166055949914432695939611227990169338248027939843597628903525815803809004448863145157344706452445088044626373001304259830129153477630812429640105937974761667785045203987508259776060285826091261745049275419393680613675366264232715305430889216384611069135662432391043725998805881663054913091981633842006354699525518784828195856033032645477338126512662942408363494651203239333321502114252811411713148843370594801145777575035630312885989779863888320759224882127141544366251503974910100721650673810303577074640154112833393047276025799811224571534249672518380758145683914398263952929391318702517417558325636082722982882372594816582486826728614633199726211273072775131325222240100140952842572490801822994224069971613534603487874996852498623584383106014533830650022411053668508165547838962087111297947300444414551980512439088964301520461155436870989509667681805149977993044444138428582065142787356455528681114392680950815418208072393532616122339434437034424287842119316058881129887474239992336556764337968538036861949918847009763612475872782742568849805927378373244946190707168428807837146267156243185213724364546701100557714520462335084082176431173346929330394071476071813598759588818954312394234331327700224455015871775476100371615031940945098788894828812648426365776746774528000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

1000C2: 499500.0
1000C2 (2nd way): 499500.0

A related same issue comes up in subtracting large numbers.

For instance, algebraically, $x^2 - (x-1)^2 = 2x-1$. But if the mantissa of $x$ is large, squaring $x$ and $x-1$ can overflow the precision of the calculation, and their difference of the computed squares will be numerically wrong.

Connecting Probability to Set Theory¶

A random experiment or random trial is basically any situation whose outcome is not perfectly predictable, but for which we can specify all possible outcomes, and that shows long-term regularities. For example, when we toss a coin, we do not know how it will land, but it certainly must land heads, tails, on its edge, or not land at all. There is no other possibility. The set of all possible outcomes of a random experiment is called the outcome space. The letter $\mathcal{S}$ will denote outcome space. We are free to choose the outcome space to correspond to what we deem relevant for the experiment, as long as it is essentially inevitable that the random experiment will result in some outcome in the outcome space. For example, the outcome space we just described was {heads, tails, edge, doesn't land}. It might be adequate for our purposes for the outcome space to be {heads, not heads}.

Often we shall tailor outcome spaces for specific problems. Here is an example: Imagine a box containing tickets that are indistinguishable except that each has written upon it a unique number between 1 and the number of tickets, $n$. We shake the box, draw a ticket from the box without looking, and record the number written on the ticket we happened to draw. The natural outcome space of this experiment is the set of numbers $\{1, 2, \ldots, n\}$. However, suppose we are interested only in whether the number on the ticket we draw is even. The outcome space then could be reduced to {even number on ticket, odd number on ticket}, or coded even more abstractly as $\{0, 1\}$, where the outcome is the number of even-numbered tickets drawn.

An event is a subset of outcome space: a collection of outcomes in the outcome space. $A$ is an event if $A \subset \mathcal{S}$. For example, in the experiment of drawing a numbered ticket from the box, suppose there are 10 tickets in all, and that we choose the outcome space $\mathcal{S}$ to be the numbers $\{1, 2, 3, \ldots, 9, 10\}$. Then "we draw the number 1" is the event $\{1\}$, and "we draw an even number" is the event $\{2, 4, 6, 8, 10\}$, both of which are subsets of the set of possible outcomes, the outcome space $\mathcal{S}$.

The treatment is quite informal, omitting important technical concepts such as sigma-algebras, measurability, and the like.

Modern probability theory starts with Kolmogorov's Axioms; here is an informal startement of the axioms. For more (but still a very informal treatment), see: Probability: Philosophy and Mathematical Background, Set theory, and Probability: Axioms and Fundaments.

Let $\mathcal{S}$ denote the outcome space, the set of all possible outcomes of a random experiment, and let $\{A_i\}_{i=1}^\infty$ be a countable collection of subsets of $\mathcal{S}$. Then any probability function ${\mathbb P}$ must satisfy these axioms:

For every $A \subset \mathcal{S}$, ${\mathbb P}(A) \ge 0$ (probabilities are nonnegative)
${\mathbb P}(\mathcal{S}) = 1$ (the chance that something happens is 100%)
If $A_i \cap A_j = \emptyset$ for $i \ne j$, then ${\mathbb P} \cup_{i=1}^\infty A_i = \sum_{i=1}^\infty {\mathbb P}(A_i)$

(If a countable collection of events is pairwise disjoint, then the chance that any of the events occurs is the sum of the chances that they occur individually.)

These axioms have many useful consequences, among them that ${\mathbb P}(\emptyset) = 0$, ${\mathbb P}(A^c) = 1 - {\mathbb P}(A)$, and ${\mathbb P}(A \cup B) = {\mathbb P}(A) + {\mathbb P}(B) - {\mathbb P}(AB)$. More generally, the Inclusion-Exclusion Principle gives us

$$ \mathbb{P} \cup_{i=1}^n A_i = \sum_{i=1}^n \mathbb{P} A_i - \sum_{1 \le i_1 < i_2 \le n} \mathbb{P}(A_{i_1}A_{i_2}) + \sum_{1 \le i_1 < i_2 < i_3 \le n} \mathbb{P}(A_{i_1}A_{i_2}A_{i_3}) - \cdots +(-1)^{k-1} \sum_{1 \le i_1 < i_2 < \cdots < i_k \le n} \mathbb{P} (A_{i_1}A_{i_2} \cdots A_{i_k}) + \cdots $$$$ = \sum_{i=1}^n \left((-1)^{i-1}\sum_{\mathcal{I} \subset\{1,\ldots,n\}: \#\mathcal I =i} \mathbb{P}(\cap_{j \in \mathcal{I}} A_j \right). $$

Definitions¶

Let $A$ and $B$ be subsets of outcome space $\mathcal{S}$.

If $AB = \emptyset$, then $A$ and $B$ are mutually exclusive.
If $A \subset B$ then $A$ implies $B$: if $A$ occurs, $B$ must

also occur (if the outcome that occurs is in $A$, the outcome that occurs is also in $B$, because every element of $A$ is an element of $B$).

If ${\mathbb P}(AB) = {\mathbb P}(A){\mathbb P}(B)$, then $A$ and $B$ are independent.
If ${\mathbb P}(B) > 0$, then the conditional probability of $A$ given $B$ is

${\mathbb P}(A | B) \equiv {\mathbb P}(AB)/{\mathbb P}(B)$.

Independence is an extremely specific relationship. At one extreme, $AB = \emptyset$; then ${\mathbb P}(AB) = 0 \le {\mathbb P}(A){\mathbb P}(B)$. At another extreme, either $A$ is a subset of $B$ or vice versa; then ${\mathbb P}(AB) = \min({\mathbb P}(A),{\mathbb P}(B)) \ge {\mathbb P}(A){\mathbb P}(B)$. Independence lies at a precise point in between.

Exercises.¶

Show that if $A$ and $B$ are independent and $\mathbb{P}(B) >0$, then

$\mathbb{P}(A|B) = \mathbb{P}(A)$.

Show that conditional probability behaves like "ordinary" probability: suppose $\mathbb{P}(B) > 0$ and $\{A_i\}$ are such that $\{A_i \cap B\}_{i=1}^\infty$ are pairwise disjoint. Then
- $\mathbb{P}(A | B) \ge 0$
- $\mathbb{P}(B | B) = 1$
- $\mathbb{P}(\cup_{i=1}^\infty A_i | B) = \sum_{i=1}^\infty \mathbb{P} (A_i | B)$

Bayes' Rule¶

Suppose $\mathbb{P}(A)$, $\mathbb{P}(B) \ne 0$. Then

$$ \mathbb{P}(A|B) = \frac{\mathbb{P}(B|A)\mathbb{P}(A)}{\mathbb{P}(B)}. $$

Bayes' rule lets you "invert" conditional probabilities to find $\mathbb{P}(A|B)$ in terms of $\mathbb{P}(B|A)$.

The Law of Total Probability¶

Let $A$ be an event and suppose $\{ A_i \}$ (a finite or countable collection of sets) is a partition of outcome space $\mathcal{S}$ such that $\mathbb{P}(A_i) > 0$ $\forall i$. Then

$$ \mathbb{P}(A) = \sum_i \mathbb{P}(A|A_i) \mathbb{P}(A_i). $$

The Law of Total Probability is basically just an application of the definition of conditional probability and Kolmogorov's third axiom, but it's extremely useful.

Examples.¶

Here is a classic application of Bayes' Rule. Suppose that a screening test for a medical condition is 95% accurate, in the sense that if one has the condition, there is a 95% chance that the test will be positive, and if one does not have the condition, there is a 95% chance that the test will be negative. Suppose that 1% of the population actually has the condition. We select someone at random from the population and test him or her. The test result is positive. What's the chance that the person actually has the condition?

$$ \mathbb{P}(\mbox{has condition}|\mbox{tests positive}) = \frac{\mathbb{P}(\mbox{tests positive }|\mbox{ has condition}) \mathbb{P}(\mbox{has condition })}{\mathbb{P}(\mbox{tests positive})} = \frac{\mathbb{P}(\mbox{tests positive }|\mbox{ has condition}) \mathbb{P}(\mbox{has condition})} {\mathbb{P}(\mbox{tests positive } | \mbox{ has condition})\mathbb{P}(\mbox{has condition}) + \mathbb{P}(\mbox{tests positive } | \mbox{ doesn't have condition})\mathbb{P}(\mbox{doesn't have condition})} = \frac{0.95}{0.95 \times 0.01 + 0.05 \times 0.99} = 16.1\%. $$

(The denominator involves applying the Law of Total Probability.)

Exercises.¶

[To do]