#!/usr/bin/env python # coding: utf-8 # ![urhere](https://omega0.xyz/omega8008/.icons/urhere.gif) # # Bin Boxes # A Bin box, or *Binary* box, contains any number of tickes # # but there are only two different labels on them. # # The picture below shows a sample of $n$ draws at random # # from a box with $r$ reds and $b$ blues. The draws could # # be made with or without replacememt. # # ![rbbox](mybox.png) # # We are often interested in counting the number of tickets of one kind # # observed in the sample of $n$. For example, if we want to count # # the number of reds, just assume that reds are ones (1) and blues are (0) # # and $S_{n}= X_{1}+\ldots +X_{n}$, i.e. the sum of the draws becomes # # the total number of reds in the sample. # # ## The Average and the SD in the Bin box # More generally if we label reds with number $R$ and blues with $B$ # # then the mean and the sd in the box are: # $$ # \begin{align*} # \mbox{(ave. in box)} &= \frac{r \times R + b \times B}{r+b}\\ # \mbox{(sd in box)} &= |R-B|\cdot \sqrt{\frac{r}{r+b}\cdot \frac{b}{r+b}} # \end{align*} # $$ # The formula for the SD follows easily from the definition of SD. # # Here is how it can be shown with sagemath with the help of [Sim](https://omega0.xyz/sim.sage) # # the Simplificator. # In[13]: get_ipython().run_line_magic('display', 'latex') load("https://omega0.xyz/sim.sage") _=var('R,B,r,b,mu,sigma') assume(r>0,b>0) assume(R,'real',B,'real') mu = (r*R+b*B)/(r+b) Var = (r*(R-mu)^2+b*(B-mu)^2)/(r+b) Var # In[8]: Var=Sim(Var) Var # In[9]: sigma = sqrt(Var) sigma # Let $p = r/(r+b)$ be the proportion of red tickets in the box. # # Thus, $p$ is the probability that a ticket drawn at random from the # # Bin box is observed to be a RED ticket. The mean $\mu$ and the SD, $\sigma$, # # i.e., the expected value and SD of a draw made at random from the box are: # # $$\begin{align*} # \mu &= p R + (1-p) B \\ # \sigma &= |B-R| \sqrt{p(1-p)} # \end{align*} # $$ # # ## Observed frequency $f$ vs. Unobserved probability $p$ # Let $f=S_{n}/n$ when $R=1$ and $B=0$. In other words $f$ is the observed # # proportion of REDs in the sample of $n$ draws. Let us assume that the draws # # are made with replacement. It does not matter if the draws are made without replacement # # provided the numbers of tickets in the box are much much larger than the number of draws. # # i.e., $n << \min(r,b)$. In fact if $n$ is less than 1%, or so, of the $\min(r,b)$ # # it does not matter. # # We have the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) providing # # probabilistic relations between $f$ and $p$. The LLN says that as the sample size increases # # the observed $f$ approaches the unobserved $p$ IN PROBABILITY. On the other hand, # # the CLT refines this approach by providing the assymptotic distribution of $f^{*}$ # # (i.e., $f$ in standard units). The distribution of $f^{*}$ approaches the bell curve. # # The LLN and the CLT are obtained under the APRIORI assumption that the draws are from # # the Bin box Bin(p). Clearly the direct probability: # # $$ # P( f | p,n) = \binom{n}{nf} p^{nf}(1-p)^{n(1-f)} # $$ # is given by the Binomial probability formula. It is however the INVERSE probability, # # $P(p|f,n)$ the one we are most interested in but this requires the APRIORI probability # # of Bin(p), since by Bayes Theorem, # # $$ # P(p|f,n) = \frac{P(f|p,n)\cdot P(\mbox{Bin}(p))}{P(f|n)} # $$ # # $P(\mbox{Bin}(p))$ is the APRIORI probability that we are drawing tickets from # # the Bin(p) box, i.e. a Binary box with a proportion $p$ of REDs. # # The space of Bin boxes is the set $\{\mbox{Bin}(p) : 00,p<1) def t(p): return pi/4+arcsin(2*p-1)/2 def x(p): return 2*cos(t(p)) def y(p): return 2*sin(t(p)) dx = diff(x(p),p) dy = diff(y(p),p) dl_square = Sim(dx^2 + dy^2) dl = sqrt(dl_square) vol = 1/sqrt(p*(1-p)) # Thus, this is the volume element: Sim(dl - vol) # In[11]: vol # Since, $00,f<1,p>0,p<1,n,'integer',n>0) def I(f,p): return f*log(f/p)+(1-f)*log((1-f)/(1-p)) def P1(f,p,n): return sqrt(n/(2*pi*f*(1-f)))*exp(-n*I(f,p)) def P2(f,p,n): return sqrt(n/(2*pi*p*(1-p)))*exp(-n*(f-p)^2/(2*p*(1-p))) q=1/2 m=8 plt1 = plot([P1(f,q,m),P2(f,q,m)],f,0,1) q=0.75 m=30 plt2 = plot([P1(f,q,m),P2(f,q,m)],f,0,1) show(plt1+plt2) # ## Information Volume from Cross Entropy # Here is an easy way to get the volume element in $\{\mbox{Bin}(p): 0 0$. # # Hence: # # $$ # \begin{align*} # \log \frac{p(x|\mu)}{p(x|\theta)} &\ge 1 - \frac{p(x|\theta)}{p(x|\mu)} \\ # \ & \\ # \mbox{ and averaging over } x\in \mbox{Box}(\mu) & \\ # \ & \\ # I(\mu : \theta) &\ge 1 - 1 = 0 # \end{align*} # $$ # # with equality, when, and only when, $\mbox{Box}(\mu)=\mbox{Box}(\theta)$ as Boxes. # # Now, same game as before with $\{\mbox{Bin}(p)\}$, just compute the information # # separation of $\mbox{Box}(\theta + t\cdot v)$ from $\mbox{Box}(\theta)$ for small values # # $t>0$ and another (velocity) vector $v=(v^{1},\ldots,v^{m})$ indicating the direction # # of change from $\mbox{Box}(\theta)$. The function, # # $h(t) = I(\theta + t v : \theta)$ as a function of $t$, is never negative and at # # $t=0$ has a global minimum of zero. In other words, $h(0)=h^{'}(0)=0$ and $h^{''}(0) > 0.$ # # The Taylor expansion of $h(t)$ about $0$ up to second order is, # # $$ # \begin{align*} # h(t) &= h(0) + h^{'}(0) t + h^{''}(0) \frac{t^{2}}{2!} + o(t^2) \\ # &= \frac{a^{2}}{2} t^{2} + o(t^{2}) # \end{align*} # $$ # # the positive number $a^{2}$ is called the Fisher Information evaluated at $v$ and # # it defines the Information metric $a^{2} = \sum_{i,j} v^{i} g_{ij}(\theta) v^{j}$ # # where from the definition of $I(\mu:\theta)$ and the chain rule of Calc3, it follows # # $$ # \begin{align*} # g_{ij}(\theta) &= \int \frac{1}{p(x|\theta)}\frac{\partial{p(x|\theta)}}{\partial\theta^{i}} # \frac{\partial{p(x|\theta)}}{\partial\theta^{j}}\, dx \\ # \\ # &= \int dx\ {p(x|\theta)}\ \frac{\partial{\log p(x|\theta)}}{\partial\theta^{i}} # \frac{\partial{\log p(x|\theta)}}{\partial\theta^{j}} \\ # \\ # &= - \int dx\ p(x|\theta)) \frac{\partial^{2}}{\partial\theta^{i}\partial\theta^{j}} \log p(x|\theta) \\ # \\ # &= \int dx\ \frac{\partial\left( 2\sqrt{p(x|\theta)}\right)}{\partial\theta^{i}}\ # \frac{\partial\left(2 \sqrt{p(x|\theta)}\right)}{\partial\theta^{j}} # \end{align*} # $$ # # giving 4 different (but equivalent) expressions for the entries of the Fisher information # # matrix at $\mbox{Box}(\theta)$. Notice the vector embedding $p \hookrightarrow 2\sqrt{p}$ in the # # last expression for the metric $g_{ij}(\theta)$ above. # # Finally, the volume element in the space is the square root # # of the determinant of Fisher Information, # # i.e., $\mbox{vol}(d\theta) = \sqrt{\det (g_{ij}(\theta))}\ d\theta$. When the total information volume $V = \int \mbox{vol}(d\theta) < \infty$ # # there exists a notion of equally likely uniform prior on the space provided by its volume element. # # When $V=\infty$, the family $\{ e^{-\alpha I(\theta:\theta_{0})}: \alpha > 0\}$ of prior distributions # # provided by their density functions w.r.t. the volume element, are some of the most ignorant [(honest)](https://omega0.xyz/omega8008/dataprior/dp4arxiv.pdf). # # objective priors in any number of dimensions! # # ## Rules of Succession # For any $00$ and $\beta > 0$ in terms of the Gamma function, # # $$ # \Gamma(\alpha) = \int_{0}^{\infty} x^{\alpha-1} e^{-x}\, dx # $$ # # that interpolates the factorials since $\alpha \Gamma(\alpha) = \Gamma(\alpha+1)$. When $f=1$, # # $$ # P(X_{n+1}=1 | f=1,n) = \frac{n+1/2}{n+1} # $$ # # | n | Laplace | vol | # | ---- | ------- | ----- | # | 0 | 0.5 | 0.5 | # | 1 | 0.62 | 0.75 | # | 2 | 0.75 | 0.88 | # # # In[ ]: