#!/usr/bin/env python
# coding: utf-8

# # Lecture 24: Gamma Distribution, Poisson Processes
# 
# 
# ## Stat 110, Prof. Joe Blitzstein, Harvard University
# 
# ----

# ## Stirling's Approximation to the Factorial
# 
# \begin{align}
#   n! &= n \, (n-1) \, (n -1) \, \dots \, 1 \\
#      &\sim \sqrt{2 \pi n} \, \left( \frac{n}{e} \right)^{n}
# \end{align}
# 
# where $\sim$ means that the ratio of the two number converges to 1 as $n$ approaches $\infty$.
# 
# This is fine when we are talking about $n \in \{1,2,3,\dots\}$, but have you ever wondered what $\pi!$ is?
# 
# For that, we need the Gamma function.

# ----

# ## Gamma Function
# 
# Just as an aside, the Beta and Gamma functions are closely related. The Gamma function should be amongst the Top 10 Functions of All Time (if there was ever such a list).
# 
# \begin{align}
#   \Gamma(a) &= \int_{0}^{\infty} x^a \, e^{-x} \, \frac{dx}{x} \quad \text{for any real }a \gt 0 \\
#   &= \int_{0}^{\infty} x^{a-1} \, e^{-x} \, dx \quad \text{(alternately)}
# \end{align}
# 
# Note that if $a$ approaches 0 from the right, the $\frac{dx}{x}$ is what drives that integral since $x^0 = 1$ and $e^{-0}=1$.
# 
# But $\int \frac{dx}{x} = log(x)$ which blows up to $-\infty$, which is why we restrict $a \gt 0$.
# 
# ### Properties of the Gamma Function
# 
# 1. $\Gamma(n) = (n-1)!$ where $n \in \mathbb{Z}^{+}$ <p/>
# 1. $\Gamma(x+1) = x \, \Gamma(x)$ <p/>
# 1. $\Gamma(\frac{1}{2}) = \sqrt{\pi}$

# ----

# ## Gamma Distribution
# 
# How would we derive a PDF that is based on the Gamma distribution? 
# 
# _By normalizing the Gamma function._
# 
# \begin{align}
#   1 &= \int_{0}^{\infty} c \, x^a \, e^{-x} \, \frac{dx}{x} \\
#   &= \int_{0}^{\infty} \frac{1}{\Gamma(a)} \, x^a \, e^{-x} \, \frac{dx}{x}
# \end{align}
# 
# And so the PDF for a Gamma distribution would be
# 
# \begin{align}
#  \operatorname{Gamma}(a,1) &= \frac{1}{\Gamma(a)} \, x^a \, e^{-x} \, \frac{dx}{x} \quad \text{for } x \gt 0 \\\\
#  \\
#  \text{More generally } Y &\sim \operatorname{Gamma}(a, \lambda) \\
#  \text{Let } Y &= \frac{X}{\lambda} \text{, where } X \sim \operatorname{Gamma}(a,1) \\
#  \rightarrow y &= \frac{x}{\lambda} \\
#  x &= \lambda \, y \\
#  \frac{dx}{dy} &= \lambda \\
#  \\
#  \Rightarrow f_Y(y) &= f_X(x) \, \frac{dx}{dy} \quad \text{ transforming } X \text{ to } Y \\
#  &= \frac{1}{\Gamma(a)} \, (\lambda y)^a \, e^{-\lambda y} \, \frac{1}{\lambda y} \, \lambda \\
#  &= \frac{1}{\Gamma(a)} \, (\lambda y)^a \, e^{-\lambda y} \, \frac{1}{y} \quad \text{for } y \gt 0
# \end{align}

# In[1]:


import matplotlib
import numpy as np
import matplotlib.pyplot as plt

from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
                               AutoMinorLocator)
from scipy.stats import gamma

get_ipython().run_line_magic('matplotlib', 'inline')

plt.xkcd()

_, ax = plt.subplots(figsize=(12,8))

alpha_values  = [1.,2.,3.,4.,5.,7.5,9.,0.5]
lambda_values = [2.,2.,2.,1.,0.5,1.,1.,0.5]
x = np.linspace(0, 20, 1000)

# plot the distributions
#fig, ax = plt.subplots(figsize=(12, 6))

for a,l in zip(alpha_values, lambda_values):
    ax.plot(x, gamma.pdf(x, a, scale=l), lw=3.2, alpha=0.6, label=r'$\alpha$={}, $\lambda$={}'.format(a,l))
 
# legend styling
legend = ax.legend()
for label in legend.get_texts():
    label.set_fontsize('large')
for label in legend.get_lines():
    label.set_linewidth(1.5)

# y-axis
ax.set_ylim([0.0, 0.5])
ax.set_ylabel(r'$f(x)$')

# x-axis
ax.set_xlim([0, 20.0])
ax.set_xlabel(r'$x$')

# x-axis tick formatting
majorLocator = MultipleLocator(5.0)
majorFormatter = FormatStrFormatter('%0.1f')
minorLocator = MultipleLocator(1.0)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)

ax.grid(color='grey', linestyle='-', linewidth=0.3)

plt.suptitle(r'Gamma Distribution $f(x) = \frac{1}{\Gamma(\alpha)} \, (\lambda x)^{\alpha} \, e^{-\lambda x} \, \frac{1}{x}$')

plt.show()


# ----

# ## Connecting the Gamma and Exponential, Poisson Distributions
# 
# ### Poisson Processes
# 
# Recall Poisson processes, where the number of events happening in a certain time period $t$ is such that
# 
# \begin{align}
#   N_t &= \text{number of events occuring up to time }t \\
#       &\sim \operatorname{Pois}(\lambda, t)
# \end{align}
# 
# where the number of events occuring in disjoint time intervals are independent.
# 
# Now earlier, we discussed how time $t$ is exponential. Consider $t_1$, the time until we observe the very first event.
# 
# \begin{align}
#   P(t_1 \gt t) &= P(N_t = 0) &\quad \text{by definition} \\
#   &= \frac{\lambda^0 \, e^{\lambda t}}{0!} \\
#   &= e^{\lambda t} \\
#   &= 1 - (1-e^{\lambda t}) &\quad \text{1 - CDF of } \operatorname{Expo}(\lambda) \\\\
#   \\ 
#   \Rightarrow &\text{time until first event } \sim \operatorname{Expo}(\lambda)
# \end{align}
# 
# And so using this argument along with the memorylessness property, all of the other times between subsequent events $t_2, t_3, \dots , t_n$ are also $\operatorname{Expo}(\lambda)$
# 
# ### Analogies: Geometric $\rightarrow$ Negative binomial (discrete); Exponential $\rightarrow$ Gamma (continuous)
# 
# So the interarrival time for events in a Poisson process are i.i.d. $\operatorname{Expo}(\lambda)$.
# 
# But what if we want to know $t_n$, the time of the $n^{th}$ arrival?
# 
# \begin{align}
#   T_n &= \sum_{i=1}^{n} X_1, X_2, \dots , X_n &\quad \text{where } X_i \text{ is i.i.d. } \operatorname{Expo}(\lambda) \\\\
#   &\sim \operatorname{Gamma}(n, \lambda) &\quad \text{ assuming } n \text{ is an integer} 
# \end{align}
# 
# Recall the relation between the geometric and negative binomial distributions:
# * In geometric $\operatorname{Geom}(p)$, we count discrete time until $1^{st}$ success
# * In negative binomial $\operatorname{NB}(n, p)$, we count discrete time until $n^{th}$ success ($\sum_{i=1}^{n} X_i \text{ where } X_i \text{ i.i.d., } X_i \sim \operatorname{Geom}(\lambda)$ )
# 
# Well, there's is something analogous between the exponential and Gamma distributions:
# * In exponential $Expo(\lambda)$, we count continuous time until $1^{st}$ success
# * In Gamma, we count continuous time until $n^{th}$ success ($\sum_{i=1}^{n}  X_i \text{ where } X_i \text{ i.i.d, } X_i \sim \operatorname{ Expo}(\lambda)$ )
# 
# ----

# ## Moments of Gamma
# 
# #### $X \sim \operatorname{Gamma}(\alpha, 1)$
# 
# Let $X \sim \operatorname{Gamma}(\alpha,1)$
# 
# Directly using LOTUS:
# 
# \begin{align}
#   \mathbb{E}(X^c) &= \int_{0}^{\infty} \frac{1}{\Gamma(a)} \,x^c \, x^{\alpha} \, e^{-x} \, \frac{dx}{x} \\
#   &= \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)} \, x^{a+c} \, e^{-x} \, \frac{dx}{x} \\
#   &= \frac{\Gamma(\alpha+c)}{\Gamma(\alpha)} &\quad \text{, where } \alpha+x \gt 0 \\\\
#   \\
#   1^{st} \text{ moment (mean) of } \operatorname{Gamma}(n,1) &= \mathbb{E}(X) \\
#   &= \frac{\Gamma(\alpha+1)}{\Gamma(\alpha)} \\
#   &= \frac{\alpha \, \Gamma(\alpha)}{\Gamma(\alpha)} \\
#   &= \boxed{ \alpha } \\\\
#   \\
#   2^{nd} \text{ moment } \operatorname{Gamma}(n,1) &= \mathbb{E}(X^2) \\
#   &= \frac{\Gamma(\alpha+2)}{\Gamma(\alpha)} \\
#   &= \frac{(\alpha+1) \, \Gamma(\alpha+1)}{\Gamma(\alpha)} \\
#   &= \frac{(\alpha+1)\alpha \, \Gamma(\alpha)}{\Gamma(\alpha)} \\
#   &= \alpha^2 + \alpha \\
#   \\
#   \Rightarrow \operatorname{Var} \text{ of } \operatorname{Gamma}(n,1) &=  \mathbb{E}(X^2) - \left( \mathbb{E}(X)^2 \right) \\
#   &= \alpha^2 + \alpha - \alpha^2 \\
#   &= \boxed{ \alpha  }
# \end{align}

# Let $X \sim \operatorname{Gamma}(\alpha, \lambda)$
# 
# Applying what we know from the case of above and using LOTUS:
# 
# \begin{align}
#   \mathbb{E}(X^c) &= \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)} \,x^c \, (\lambda x)^{\alpha} \, e^{-\lambda x} \, \frac{dx}{\lambda x} \\
#   &= \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)} \, x^{\alpha+c} \, \lambda^{\alpha} \, e^{-\lambda x} \, \frac{dx}{\lambda \, x} \\
#   &= \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)} \, (\lambda \, x)^{\alpha+c} \, e^{-\lambda x} \, \frac{dx}{\lambda \, x} \, \frac{1}{\lambda^c} \\
#   &= \frac{\Gamma(\alpha+c)}{\Gamma(\alpha) \, \lambda^c} &\quad \text{, where } \alpha+x \gt 0 \\\\
#   \\
#   1^{st} \text{ moment (mean) of } \operatorname{Gamma}(\alpha,\lambda) &= \mathbb{E}(X) \\
#   &= \frac{\Gamma(\alpha+1)}{\Gamma(\alpha) \, \lambda} \\
#   &= \frac{\alpha \, \Gamma(\alpha)}{\Gamma(\alpha) \, \lambda} \\
#   &= \boxed{ \frac{\alpha}{\lambda} } \\\\
#   \\
#   2^{nd} \text{ moment } \operatorname{Gamma}(\alpha,\lambda) &= \mathbb{E}(X^2) \\
#   &= \frac{\Gamma(\alpha+2)}{\Gamma(\alpha) \, \lambda^2} \\
#   &= \frac{(\alpha+1) \, \Gamma(\alpha+1)}{\Gamma(\alpha) \, \lambda^2} \\
#   &= \frac{(\alpha+1)\alpha \, \Gamma(\alpha)}{\Gamma(\alpha) \, \lambda^2} \\
#   &= \frac{(\alpha+1)\alpha}{\lambda^2} \\
#   \\
#   \Rightarrow \operatorname{Var} \text{ of } \operatorname{Gamma}\alpha,\lambda) &=  \mathbb{E}(X^2) - \left( \mathbb{E}(X)^2 \right) \\
#   &= \frac{(\alpha+1)\alpha}{\lambda^2} - \left(  \frac{\alpha}{\lambda} \right)^2 \\
#   &= \frac{\alpha + \alpha^2 - \alpha^2}{\lambda^2} \\
#   &= \boxed{  \frac{\alpha}{\lambda^2} }
# \end{align}
# 
# ----

# View [Lecture 24: Gamma distribution and Poisson process | Statistics 110](http://bit.ly/2CyadAf) on YouTube.