In probability, we assume we know what the world is like, and predict what we would then see.
In statistics, we make observations and use them to learn something about the world. That generally requires a mathematical model for how the obervations were generated, to relate the observations to properties of the world. Sometimes, that model arises naturally from knowledge of the process that generated the data, for instance, a randomized experiment or a physical measurement where the underlying physics is understood. In other cases, the model is basically an assumption. When the model does not have a firm basis in actual knowledge, inferences are suspect. As Robert L. Parker says, "the more you assume, the less you know."
Probability is like a forward problem in science or engineering: the the physics, including any parameters, is known. The issue is to predict what will be observed, up to instrumental error and other noise.
Statistics, at least inferential statistics, is like an inverse problem. The physics is known but there are unknown parameters. The issue is to use the (noisy) data to learn about parameters of the system.
Probability is a combination of mathematics and philosophy. The philosophy connects the mathematics to the world.
We will start with the mathematics.
The mathematics of probability is expressed most naturally using set theory. We will review the basic terminology and reviews naive set theory: how to define and manipulate sets, operations on sets that yield other sets, special relationships among sets, and so on.
A set is a collection of objects, called members or elements of the set, without regard for their order. a∈A, pronounced "a is an element of A," "a is in A," or "a is a member of A" means that a is an element of the set A. This is the same as writing A∋a, which is pronounced "A contains a." If a is not an element of A, we write a∉A. Sets may be described explicitly by listing their contents, or implicitly by specifying a property that all elements of the set share, or a condition that they satisfy. The contents of sets are enclosed in curly braces: {}.
Here are some examples:
B is a subset of A, written B⊂A or A⊃B, if every element of the set B is also an element of the set A. Thus N⊂Z⊂Q⊂ℜ⊂C. The empty set ∅ is a subset of every set. If A⊂B and B⊂A, A and B are the same set, and we write A=B. If B is not a subset of A, we write B⊄A or A⊅B. B is a proper subset of A if B⊂A but A⊄B.
The complement of A (with respect to the universe X), written Ac or A′, is the set of all objects under consideration (X) that are not elements of A. That is, Ac≡{a∈X:a∉A}.
The intersection of A and B, written A∩B or AB, is the set of all objects that are elements of both A and B: A∩B≡{a:a∈A and a∈B}.
The union of A and B, written A∪B, is the set of all objects that are elements of A or of B (or both): A∪B≡{a:a∈A or a∈B or both}.
The difference or set difference of A and B, A∖B, pronounced "A minus B," is the set of all elements of A that are not elements of B: A∖B≡{a∈A:a∉B}=A∩Bc.
Intervals are special subsets of R: [a,b]≡{x∈ℜ:a≤x≤b}
A collection of sets {Aβ:β∈B} is pairwise disjoint if Aβ∩Aβ′=∅ whenever β≠β′. The collection {Aβ:β∈B} exhausts or covers the set A if A⊂⋃β∈BAβ. The collection {Aβ:β∈B} is a partition of the set A if A=∪β∈BAβ and the sets {Aβ:β∈B} are pairwise disjoint. If {Aβ:β∈B} are pairwise disjoint and exhaust A, then {Aβ∩A:β∈B} is a partition of A.
A set is countable if its elements can be put in one-to-one correspondence with a subset of N. A set is finite if its elements can be put in one-to-one correspondence with {1,2,…,n} for some n∈N. If a set is not finite, it is infinite. N, Z, and Q are infinite and countable (countably infinite); R is infinite and uncountable.
The notation #A, pronounced "the cardinality of A" is the size of the set A, suitably defined. If A is finite, #A is the number of elements in A. If A is not finite but A is countable (if its elements can be put in one-to-one correspondence with the elements of N), then #A=ℵ0 (aleph-null). If the elements of A can be put in one-to-one correspondence with the elements of R, then #A=c, the cardinality of the continuum.
The power set of a set A, denoted 2A, is the set of all subsets of the set A. For example, the power set of {a,b,c} is {∅,{a},{b},{c},{a,b},{a,c},{b,c},{a,b,c}}.
If A is a finite set, B is a countable set and {Aj:β∈B} is a partition of A, then #A=∑β∈B#Aβ.
Suppose A=A1∪A2, but that possibly A1A2≠∅, so {A1,A2} might not be a partition of A, because A1 and A2 are not necessarily disjoint. Then still #A=#A1+#A2−#A1A2.
This is seen most easily using a Venn diagram, and can be proved by constructing a partition of A, A=A1Ac2∪Ac1A2∪A1A2, and noting that #A1+#A2=#A1Ac2+#Ac1A2+2#A1A2.
If A=A1∪A2∪A3 but {A1,A2,A3} are not necessarily disjoint, then still
#A=#A1+#A2+#A3−#A1A2−#A1A3−#A2A3+#A1A2A3.More generally, if A=∪ni=1Ai, then the Inclusion-Exclusion Principle holds:
#A=n∑i=1#Ai−∑1≤i1<i2≤n#(Ai1Ai2)+∑1≤i1<i2<i3≤n#(Ai1Ai2Ai3)−⋯+(−1)k−1∑1≤i1<i2<⋯<ik≤n#(Ai1Ai2⋯Aik)+⋯If A⊂B and B⊂A then A=B. (If two sets are subsets of each other, they are the same set.)
∅⊂A. (The empty set is a subset of every set.)
∅∪A=A.
(The union of the empty set and any other set is that set.)
(The intersection of the empty set and any other set is empty.)
If A⊂B and B⊂C then A⊂C. (The subset relation is transitive.)
If A⊂B then Bc⊂Ac.
(Complementation reverses the subset relation.)
A∩B⊂A. Moreover, A∩B=A if and only if A⊂B.
A⊂A∪B. Moreover, A=A∪B if and only if B⊂A.
(A∩B)c=Ac∪Bc. (de Morgan)
(A∪B)c=Ac∩Bc. (de Morgan)
A∩B=B∩A. (Intersection is commutative.)
A∪B=B∪A. (Union is commutative.)
A∩(B∩C)=(A∩B)∩C. (Intersection is associative.)
A∪(B∪C)=(A∪B)∪C. (Union is associative.)
A∩(B∪C)=(A∩B)∪(A∩C). (Distribution of intersection over union.)
A∪(B∩C)=(A∪B)∩(A∪C). (Distribution of union over intersection.)
Prove that if {Aβ:β∈B} are pairwise disjoint and exhaust A, then {Aβ∩A:β∈B} is a partition of A.
Prove de Morgan's identity (A∪B)c=Ac∩Bc.
Prove that {A1Ac2,Ac1A2,A1A2} is a partition of A1∪A2.
# boilerplate
%matplotlib inline
from __future__ import division
import math
import numpy as np
import scipy as sp
from scipy import special
import matplotlib.pyplot as plt
# Sets in Python
# Make three sets, A = {1, 2, 3}, B = {"a", "b", "green", 3}, C = {1, 2}
A = [1, 2, 3, 3] # this is a list but not a set, because 3 is listed twice
print 'A: ', A
A = set(A) # this is a set containing the unique elements of A
print 'A as a set (duplicates removed):',A
B = set(["a", "b", "green", 3])
print 'B:', B
C = set(range(1,3)) # a different construction
print 'C:', C
#
# Set membership
print '1 is in A:', 1 in A # is 1 an element of A?
print '"green" is in A:',"green" in A
print '"green" is in B:',"green" in B
print 'A is a subset of C:', A.issubset(C)
print 'C is a subset of A:', C.issubset(A)
print 'A intersect B:', A.intersection(B)
A: [1, 2, 3, 3] A as a set (duplicates removed): set([1, 2, 3]) B: set(['a', 3, 'b', 'green']) C: set([1, 2]) 1 is in A: True "green" is in A: False "green" is in B: True A is a subset of C: False C is a subset of A: True A intersect B: set([3])
The notation (s1,s2,…,sn)≡(sj)nj=1 denotes an ordered n-tuple consisting of s1 in the first position, s2 in the second position, etc. The parentheses are used instead of curly braces to distinguish n-tuples from sets: (sj)nj=1≠{sj}nj=1. The kth component of the n-tuple s=(sj)nj=1, is sk, k=1,2,…,n. Two n-tuples are equal if their components are equal. That is, (sj)nj=1=(tj)nj=1 means that sj=tj for j=1,…,n. In particular, (s,t)≠(t,s) unless s=t. In contrast, {s,t}={t,s} always.
The Cartesian product of S and T is S⨂T≡{(s,t):s∈S and t∈T}. Unless S=T, S⨂T≠T⨂S. Rn is the Cartesian product of R with itself, n times; its elements are n-tuples of real numbers. If s is the n-tuple (s1,s2,…,sn)=(sj)nj=1 and t is the n-tuple (t1,t2,…,tn)=(tj)nj=1, then s=t iff sj=tj for all j=1,…,n.
The Python library itertools has utilities for generating Cartesian products, power sets, permutations, and the like.
# Cartesian products using itertools
import itertools
A = ['a','b','c','d']
B = range(3)
print [i for i in itertools.product(A,B)]
[('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2), ('c', 0), ('c', 1), ('c', 2), ('d', 0), ('d', 1), ('d', 2)]
Let A be a finite set with #A=n. A permutation of (the elements of) A is an element s of ⨂nj=1A=An whose components are distinct elements of A. That is, s=(sj)nj=1∈An is a permutation of A if #{sj}nj=1=n. There are n!=n(n−1)⋯1 permutations of a set with n elements: there are n choices for the first component of the permutation, n−1 choices for the second (whatever the first might be), n−2 for the third (whatever the first two might be), etc.
This is an illustration of the Fundamental Rule of Counting: If specifying each (unique) element of a set can be expressed as a series n of choices, and the number of options at step i of the series is ni regardless of the choices made before step i, then the total number of elements of the set is ∏ni=1ni.
The number of permutations of n things taken k at a time, nPk, is the number of ways there are of selecting k of n things, then permuting those k things. There are n choices for the object that will be in the first place in the permutation, n−1 for the second place (regardless of which is first), etc., and n−k+1 choices for the item that will be in the kth place. By the fundamental rule of counting, it follows that nPk=n(n−1)⋯(n−k+1)=n!/(n−k)!.
The number of subsets of size k that can be formed from n objects is nCk=(nk)=nPk/k!=n(n−1)⋯(n−k+1)/k!=n!k!(n−k)!,
Here are some useful identities:
Because the power set of a set with n elements can be partitioned as
∪nk=0{all distinct subsets of size k},and since the power set has 2n elements, it follows that n∑j=0(nj)=2n.
# Combinatorics in Python
n = 5
k = 3
print 'n!:',math.factorial(n)
# number of combinations of n objects taken k at a time
print 'nCk:',sp.special.binom(n,k)
# number of permutations of n objects taken k at a time
def permu(n, k):
permu = 1
s = 0
while (s < k):
permu = permu*(n-s)
s = s+1
return(permu)
print 'nPk:',permu(n, k)
n!: 120 nCk: 10.0 nPk: 60
# enumerate combinations and permutations using itertools
A = ['a','b','c','d','e']
print 'A:', A
k = 2
print 'all combinations of', k, 'elements of A:', [i for i in itertools.combinations(A,k)]
print 'all permutations of', k, 'elements of A:', [i for i in itertools.permutations(A,k)]
A: ['a', 'b', 'c', 'd', 'e'] combinations of 2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'd'), ('c', 'e'), ('d', 'e')] permutations of 2 elements of A: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'a'), ('b', 'c'), ('b', 'd'), ('b', 'e'), ('c', 'a'), ('c', 'b'), ('c', 'd'), ('c', 'e'), ('d', 'a'), ('d', 'b'), ('d', 'c'), ('d', 'e'), ('e', 'a'), ('e', 'b'), ('e', 'c'), ('e', 'd')]
There are several approaches to counting the elements of a set that can be helpful in practice, aside from simply enumerating the elements.
Write a Python function that constructs all permutations of a finite list.
The function should take as input a list s, and produce as output all permutations of the elements of the list. Items should be considered unique based on their position in the list.
For instance,
s = ['a','b','c']
permute(s)
should return the 6 permutations of a, b, and c:
a b c
a c b
b a c
b c a
c a b
c b a
Write your function from scratch; i.e., do not use any libraries that are not part of the core of Python (e.g., do not use the itertools library).
As a matter of mathematics, (nk)=n!k!(n−k)!, but this is a bad way to compute (nk). When n is large, the numerator n! is enormous, and can overflow finite-precision calculations even for values of n and k for which (nk) is small. Moreover, relying on cancellation in finite-precision arithmetic can lead to large errors, whether the cancellation is through division or subtraction.
So, rather than compute (nk) by dividing factorials, it's better to build it by multiplying ratios.
# illustrating issues calculating nCk as a ratio of factorials for large n
n = 1000
k = 2
print '1000!:', math.factorial(n) # Really Big Number:
# would overflow in many languages,
# including R, MATLAB, Fortran
# however, 1000 choose 2 is only (1000*999)/2 = 499500
print '\n998!:', math.factorial(n-k)
print '\n1000C2:', sp.special.binom(n, k)
# let's code this from scratch in a stable way
def myChoose(n, k):
'''
no error checking. If this were for production, we'd check that the arguments are integers
and that n >= k >= 0.
'''
m = 0
myc = 1
while (m < k):
myc = myc * (n-m)/(m+1)
m = m+1
return(myc)
print '1000C2 (2nd way):', myChoose(n, k)
1000!: 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 998!: 402790050127220994538240674597601587306681545756471103647447357787726238637266286878923131618587992793273261872069265323955622495490298857759082912582527118115540044131204964883707335062250983503282788739735011132006982444941985587005283378024520811868262149587473961298417598644470253901751728741217850740576532267700213398722681144219777186300562980454804151705133780356968636433830499319610818197341194914502752560687555393768328059805942027406941465687273867068997087966263572003396240643925156715326363340141498803019187935545221092440752778256846166934103235684110346477890399179387387649332483510852680658363147783651821986351375529220618900164975188281042287183543472177292257232652561904125692525097177999332518635447000616452999984030739715318219169707323799647375797687367013258203364129482891089991376819307292252205524626349705261864003453853589870620758596211518646408335184218571196396412300835983314926628732700876798309217005024417595709904449706930796337798861753941902125964936412501007284147114260935633196107341423863071231385166055949914432695939611227990169338248027939843597628903525815803809004448863145157344706452445088044626373001304259830129153477630812429640105937974761667785045203987508259776060285826091261745049275419393680613675366264232715305430889216384611069135662432391043725998805881663054913091981633842006354699525518784828195856033032645477338126512662942408363494651203239333321502114252811411713148843370594801145777575035630312885989779863888320759224882127141544366251503974910100721650673810303577074640154112833393047276025799811224571534249672518380758145683914398263952929391318702517417558325636082722982882372594816582486826728614633199726211273072775131325222240100140952842572490801822994224069971613534603487874996852498623584383106014533830650022411053668508165547838962087111297947300444414551980512439088964301520461155436870989509667681805149977993044444138428582065142787356455528681114392680950815418208072393532616122339434437034424287842119316058881129887474239992336556764337968538036861949918847009763612475872782742568849805927378373244946190707168428807837146267156243185213724364546701100557714520462335084082176431173346929330394071476071813598759588818954312394234331327700224455015871775476100371615031940945098788894828812648426365776746774528000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1000C2: 499500.0 1000C2 (2nd way): 499500.0
A related same issue comes up in subtracting large numbers.
For instance, algebraically, x2−(x−1)2=2x−1. But if the mantissa of x is large, squaring x and x−1 can overflow the precision of the calculation, and their difference of the computed squares will be numerically wrong.
A random experiment or random trial is basically any situation whose outcome is not perfectly predictable, but for which we can specify all possible outcomes, and that shows long-term regularities. For example, when we toss a coin, we do not know how it will land, but it certainly must land heads, tails, on its edge, or not land at all. There is no other possibility. The set of all possible outcomes of a random experiment is called the outcome space. The letter S will denote outcome space. We are free to choose the outcome space to correspond to what we deem relevant for the experiment, as long as it is essentially inevitable that the random experiment will result in some outcome in the outcome space. For example, the outcome space we just described was {heads, tails, edge, doesn't land}. It might be adequate for our purposes for the outcome space to be {heads, not heads}.
Often we shall tailor outcome spaces for specific problems. Here is an example: Imagine a box containing tickets that are indistinguishable except that each has written upon it a unique number between 1 and the number of tickets, n. We shake the box, draw a ticket from the box without looking, and record the number written on the ticket we happened to draw. The natural outcome space of this experiment is the set of numbers {1,2,…,n}. However, suppose we are interested only in whether the number on the ticket we draw is even. The outcome space then could be reduced to {even number on ticket, odd number on ticket}, or coded even more abstractly as {0,1}, where the outcome is the number of even-numbered tickets drawn.
An event is a subset of outcome space: a collection of outcomes in the outcome space. A is an event if A⊂S. For example, in the experiment of drawing a numbered ticket from the box, suppose there are 10 tickets in all, and that we choose the outcome space S to be the numbers {1,2,3,…,9,10}. Then "we draw the number 1" is the event {1}, and "we draw an even number" is the event {2,4,6,8,10}, both of which are subsets of the set of possible outcomes, the outcome space S.
The treatment is quite informal, omitting important technical concepts such as sigma-algebras, measurability, and the like.
Modern probability theory starts with Kolmogorov's Axioms; here is an informal startement of the axioms. For more (but still a very informal treatment), see: Probability: Philosophy and Mathematical Background, Set theory, and Probability: Axioms and Fundaments.
Let S denote the outcome space, the set of all possible outcomes of a random experiment, and let {Ai}∞i=1 be a countable collection of subsets of S. Then any probability function P must satisfy these axioms:
(If a countable collection of events is pairwise disjoint, then the chance that any of the events occurs is the sum of the chances that they occur individually.)
These axioms have many useful consequences, among them that P(∅)=0, P(Ac)=1−P(A), and P(A∪B)=P(A)+P(B)−P(AB). More generally, the Inclusion-Exclusion Principle gives us
P∪ni=1Ai=n∑i=1PAi−∑1≤i1<i2≤nP(Ai1Ai2)+∑1≤i1<i2<i3≤nP(Ai1Ai2Ai3)−⋯+(−1)k−1∑1≤i1<i2<⋯<ik≤nP(Ai1Ai2⋯Aik)+⋯Let A and B be subsets of outcome space S.
also occur (if the outcome that occurs is in A, the outcome that occurs is also in B, because every element of A is an element of B).
P(A|B)≡P(AB)/P(B).
Independence is an extremely specific relationship. At one extreme, AB=∅; then P(AB)=0≤P(A)P(B). At another extreme, either A is a subset of B or vice versa; then P(AB)=min(P(A),P(B))≥P(A)P(B). Independence lies at a precise point in between.
P(A|B)=P(A).
Suppose P(A), P(B)≠0. Then
P(A|B)=P(B|A)P(A)P(B).Bayes' rule lets you "invert" conditional probabilities to find P(A|B) in terms of P(B|A).
Let A be an event and suppose {Ai} (a finite or countable collection of sets) is a partition of outcome space S such that P(Ai)>0 ∀i. Then
P(A)=∑iP(A|Ai)P(Ai).The Law of Total Probability is basically just an application of the definition of conditional probability and Kolmogorov's third axiom, but it's extremely useful.
Here is a classic application of Bayes' Rule. Suppose that a screening test for a medical condition is 95% accurate, in the sense that if one has the condition, there is a 95% chance that the test will be positive, and if one does not have the condition, there is a 95% chance that the test will be negative. Suppose that 1% of the population actually has the condition. We select someone at random from the population and test him or her. The test result is positive. What's the chance that the person actually has the condition?
P(has condition|tests positive)=P(tests positive | has condition)P(has condition )P(tests positive)=P(tests positive | has condition)P(has condition)P(tests positive | has condition)P(has condition)+P(tests positive | doesn't have condition)P(doesn't have condition)=0.950.95×0.01+0.05×0.99=16.1%.(The denominator involves applying the Law of Total Probability.)
[To do]