This Jupyter notebook is the Python equivalent of the R code in section 6.9 R, pp. 267 - 272, Introduction to Probability, 1st Edition, Blitzstein & Hwang.
import numpy as np
The MGF of an r.v. is a function. As an example of defining and working with functions in Python, let's use the $N(0, 1)$ MGF, which is given by $M(t) = e^{\frac{t^{2}}{2}}$. The code
def M(t):
""" Return N(0, 1) MGF evalated at t. """
return np.exp(t**2/2)
print('calling function M with a single value: \nM(0) = {}\n'.format(M(0)))
print('calling M with a vector: \nM(np.arange(1,11))) = {}'.format(M(np.arange(1,11))))
calling function M with a single value: M(0) = 1.0 calling M with a vector: M(np.arange(1,11))) = [1.64872127e+00 7.38905610e+00 9.00171313e+01 2.98095799e+03 2.68337287e+05 6.56599691e+07 4.36731791e+10 7.89629602e+13 3.88084696e+17 5.18470553e+21]
defines M
to be this function. The def
Python keyword says that we're defining a function. The function is named M
, and it takes one variable t
(called the argument or parameter of the function). The line declaring the function name and list of parameter(s) is terminated with a colon :
, with the body of the function following on the next line after an indent. Note that a simple Python function will not be able to flexibly deal with both single or vector inputs, such as is possible with a function in R. However, since our function M
relies on numpy.exp
, M
can accept both a nested sequence of objects or numpy
arrays as inputs, and return an single or tuple of numpy
array as output.
Writing
def M(x):
return np.exp(x**2/2)
would define the same function M
, except that now the parameter is named x
. Giving the parameters names is helpful for functions of more than one parameter, since Python then saves us from having to remember the order in which to write the parameters, and allows us to assign default values. For example, the $N(\mu, \sigma^2)$ MGF is given by $g(t) = exp\left(\mu \, t + \frac{\sigma^2 \, t^2}{2} \right)$, which depends on $t$, $mu$, and $\sigma$. We can define this in Python by
def g(t, mean=0, sd=1):
""" Return the N(mean, sd) MGF evaluated at t.
default mean (mu) = 0
default sd (sigma) = 1
"""
return np.exp(mean*t + (sd**2 * t**2)/2)
ans = g(1, 2, 3)
print('N(2, 3) MGF evaluated at 1 = {}'.format(ans))
N(2, 3) MGF evaluated at 1 = 665.1416330443618
What is g(1, 2, 3)
? It's the $N(2, 3^{2})$ MGF evaluated at 1, but it may be hard to remember which parameter is which, especially when working with many functions with many parameters over the course of many months. So we can also write g(1, mean=2, sd=3)
or g(1, sd=3, mean=2)
. Since the mean
and sd
function parameters have the form parameter = expression
, the function is said to have "default parameter values."
Also, when defining g
we specified default values of 0 for the mean
and 1 for the sd
standard deviation, so if we want the $N(0, 5^2)$ MGF evaluated at 3, we can use g(3, sd=5)
as shorthand. It would be bad here to write g(3, 5)
, since that is ambiguous about which argument is omitted; in fact, Python interprets this as g(t3, mean=5)
.
ans1 = g(1, mean=2, sd=3)
print('g(1, mean=2, sd=3) = {}\t... explicitly using parameter names'.format(ans1))
ans2 = g(3, 5)
print('g(3, 5) = {}\t\t... but which parameter was omitted?'.format(ans2))
ans3 = g(3, mean=5)
print('g(3, mean=5) = {}\t... \'mean\' parameter was omitted'.format(ans3))
g(1, mean=2, sd=3) = 665.1416330443618 ... explicitly using parameter names g(3, 5) = 294267566.0415088 ... but which parameter was omitted? g(3, mean=5) = 294267566.0415088 ... 'mean' parameter was omitted
LOTUS makes it easy to write down any moment of a continuous r.v. as an integral. The scipy.integrate
module in SciPy can help us do the integral numerically, using the scipy.integrate.quad
function.
from scipy.integrate import quad
# to learn more about scipy.integrate.quad, un-comment out the following line
#print(quad.__doc__)
For example, let's approximate the 6th moment of a $N(0, 1)$ r.v. The code
from scipy.stats import norm
def g(x):
""" Return the 6th moment of N(0, 1). """
return x**6 * norm.pdf(x)
y, abserr = quad(g, -np.inf, np.inf)
print('6th moment of N(0,1) = {}, with error = {}'.format(y, abserr))
6th moment of N(0,1) = 15.000000000000004, with error = 4.4229319318339374e-09
asks quad
to compute $\int_{-\infty}^{\infty} g(x) \, dx$, where $g(x) = x^6 \, \phi(x)$ with $\phi$ the $N(0, 1)$ PDF. When we ran this, quad
reported 15 (the correct answer, as we know from this chapter!) and that the absolute error was less than 4.423 $\times$ 10−9.
☣ 6.9.1. Numerical integration runs into difficulties for some functions; as usual, checking answers in multiple ways is a good idea. Using numpy.inf
for parameter b
(the upper limit of integration) is preferred to using a large number as the upper limit when integrating up to $\infty$ (and likewise for a lower limit of a = -numpy.inf
for $-\infty$. For example, on many systems quad(norm.pdf, 0, 10**6)
reports 0.0 while quad(norm.pdf, 0, numpy.inf)
reports the correct answer, 0.5.
However, all of the continuous distributions supported in scipy.stats
have a moment(n, loc=0, scale=1)
function that allows you to quickly and easily obtain the nth order non-central moment of the continuous distribution in question. For example:
print('norm.moment(n=6) = {}'.format(norm.moment(n=6)))
norm.moment(n=6) = 15.00000000089533
Similarly, to check that the 2nd moment (and variance) of a $Unif(−1, 1)$ r.v. is 1/3, we can use quad
again
from scipy.stats import uniform
def h(x):
""" Return 2nd moment (var) of Unif(-1, 1).
scipy.stats.uniform is constant between
a = loc
b = loc + scale
"""
a = -1
b = 1
loc = a
scale = b - loc
return x**2 * uniform.pdf(x, loc=loc, scale=scale)
y, abserr = quad(h, -1, 1)
print('2nd moment of Unif(-1,1) = {}, with error = {}'.format(y, abserr))
2nd moment of Unif(-1,1) = 0.3333333333333333, with error = 3.700743415417188e-15
Alternately, we can either use uniform.moment(n=2, loc=-1, scale=2)
or just uniform.var(loc=-1, scale=2)
, keeping in mind that loc=-1
and scale = interval length = 2
.
ans1 = uniform.moment(n=2, loc=-1, scale=2)
print('uniform.moment(n=2, loc=-1, scale=2) = {}'.format(ans1))
ans2 = uniform.var(loc=-1, scale=2)
print('uniform.var(loc=-1, scale=2) = {}'.format(ans2))
uniform.moment(n=2, loc=-1, scale=2) = 0.33333333333333326 uniform.var(loc=-1, scale=2) = 0.3333333333333333
For moments of a discrete r.v., we can use LOTUS and the numpy.sum
function. For example, to find the 2nd moment of $X \sim Pois(7)$, we can use
from scipy.stats import poisson
def g(k):
""" Return the 2nd moment of Pois(7) evaluated at k. """
return k**2 * poisson.pmf(k, 7)
# we want to sum up to and including 100, so the upper limit is 100+1
ans = np.sum(g(np.arange(0, 100+1)))
print('2nd moment of Pois(7) = {}'.format(ans))
2nd moment of Pois(7) = 55.99999999999994
Here we summed up to 100 since it’s clear after getting a sense of the terms that the total contribution of all the terms after k = 100 is negligible (choosing an upper limit in this way is in contrast to how to use the integrate command in the continuous case). The result is extremely close to 56, which is comforting since $E(X^{2}) = Var(X) + (EX)^{2} = 7 + 49 = 56$.
But similar to continous r.v., the discrete r.v. in scipy.stats
have a moment
function as well.
ans = poisson.moment(n=2, mu=7)
print('poisson.moment(n=2, mu=7) = {}'.format(ans))
poisson.moment(n=2, mu=7) = 56
A sample moment can be found in easily using NumPy. If x
is a vector of data, then numpy.mean(x)
gives its sample mean and, more generally, numpy.mean(x**n)
gives the nth sample mean for any positive integer n
. For example,
# seed the random number generator
np.random.seed(6765)
x = norm.rvs(size=100)
print('sample moment: numpy.mean(x**6) = {}'.format(np.mean(x**6)))
sample moment: numpy.mean(x**6) = 9.336344243424868
gives the 6th sample moment of 100 i.i.d. $N(0, 1)$ r.v.s. How close is it to the true 6th moment? How close are other sample moments to the corresponding true moments?
The sample variance can also be found in easily with NumPy. If x
is a vector of data, then using the ddof
parameter (delta degrees of freedom) such as in numpy.var(x, ddof=1)
gives its sample variance. This returns nan
(not a number) as well as issuing RuntimeWarning: Degrees of freedom <= 0 for slice
(the divisor used in the calculation is len(x) - ddof
) if x
has length 1, since the $n − 1$ in the denominator is 0 in this case. It makes sense not to return a numerical value in this case, not only because of the definition but also because it would be insane to try to estimate the variability of a population if we only have one observation!
For a simple demonstration of using the sample mean and sample variance to estimate the true mean and true variance of a distribution, we generate 1000 times from a $N(0, 1)$ distribution and store the values in z
. We then compute the sample mean and sample variance with numpy.mean(z)
and numpy.var(z, ddof=1)
.
np.random.seed(10946)
z = norm.rvs(size=1000)
mu_z = np.mean(z)
print('sample mean of z: {}'.format(mu_z))
var_z = np.var(z, ddof=1)
print('sample variance of z: {}'.format(var_z))
sample mean of z: 0.00915384129813555 sample variance of z: 1.073340109262066
We find that numpy.mean(z)
is close to 0 and numpy.var(z, ddof=1)
is close to 1. You can try this out for a $N(\mu, \sigma^2)$ distribution (or other distribution) of your choosing; just remember that numpy.norm.rvs
takes $\sigma$ as the scale
parameter, and not $\sigma^2$!
The sample standard deviation of x
can be found using numpy.std(x, ddof=1)
. This gives the same result as numpy.sqrt(numpy.var(x, ddof=1))
.
sd1 = np.std(z, ddof=1)
print('np.std(z, ddof=1) = {}'.format(sd1))
sd2 = np.sqrt(np.var(z, ddof=1))
print('np.sqrt(np.var(z, ddof=1)) = {}'.format(sd2))
np.std(z, ddof=1) = 1.0360212880351765 np.sqrt(np.var(z, ddof=1)) = 1.0360212880351765
While the scipy.stats
module does have functions for skew
and kurtosis
, both functions rely on the population standard deviation (with $n$ rather than $n-1$ in the denominator).
However, we can easily define our own functions for sample skewness and sample kurtosis by using scipy.stats.moment
and numpy.std(z, ddof=1)
.
from scipy.stats import moment
# to learn more about scipy.stats.moment, un-comment out the following line
#print(moment.__doc__)
def skew(x, use_sample_sd=True):
""" Return the skew of x.
Default is to use sample standard deviation on the denominator,
yielding sample skew.
Specifying use_sample_sd=False is the same as using
scipy.stats.skew(x).
"""
ddof = 1 if use_sample_sd==True else 0
return moment(x, 3) / np.std(x, ddof=ddof)**3
print('sample skew of z = {}'.format(skew(z)))
sample skew of z = 0.02333931072753422
def kurt(x, use_sample_sd=True):
""" Return the excess kurtosis of x.
Default is to use sample standard deviation on the denominator,
yielding sample kurtosis.
Specifying use_sample_sd=False is the same as using
scipy.stats.kurtosis(x).
"""
ddof = 1 if use_sample_sd==True else 0
return moment(x, 4) / np.std(x, ddof=ddof)**4 - 3.0
print('sample kurtosis of z = {}'.format(kurt(z)))
sample kurtosis of z = 0.16247912720035718
To find the median of a continuous r.v. with CDF $F$, we need to solve the equation $F(x) = 1/2$ for $x$, which is equivalent to finding the root (zero) of the function $g$ given by $g(x) = F(x) − 1/2$. This can be done using scipy.optimize.root
in SciPy. For example, let's find the median of the Expo(1) distribution. The code
from scipy.stats import expon
from scipy.optimize import root
# to learn more about scipy.optimize.root, un-comment out the following line
#print(root.__doc__)
def g(x):
""" Assuming F(x) = Expo(1),
define a function g(x) = F(x) - 1/2
"""
return expon.pdf(x) - 1/2
# set our inital guess at a root to be 0
root(g, 0)
fjac: array([[-1.]]) fun: array([-5.55111512e-17]) message: 'The solution converged.' nfev: 9 qtf: array([-1.56348268e-11]) r: array([0.5000001]) status: 1 success: True x: array([0.69314718])
asks root
to find a root of the desired function, with an initial guess of 0. This returns an answer very close to the true answer of $log(2) ≈ 0.693$. Of course, in this case we can solve $1 − e^{−x} = \frac{1}{2}$ directly without having to use numerical methods.
☣ 6.9.2. scipy.optimize.root
is useful but there is no guarantee that it will find a root (success
may be False
). When using scipy.optmize.root
please pay attention to the status
, success
and message
in the result returned.
An easier way to find the median of the $Expo(1)$ is to use scipy.stats.expon.median
. The median
function of returns the median of the distribution, for expon
as well as all other continuous distributions in scipy.stats
).
ans = expon.median()
print('expon.median() = {}'.format(ans))
expon.median() = 0.6931471805599453
For finding the mode of a continuous distribution, we can use the fminbound
function in SciPy. For example, let's find the mode of the $Gamma(6, 1)$ distribution, which is an important distribution that we will introduce in the next chapter. Its PDF is proportional to $x^{5} \, e^{−x}$. Using calculus, we can find that the mode is at $x = 5$. Using scipy.optimize.fminbound
, we can find that the mode is very close to $x = 5$ as follows.
def h(x):
""" Since fminbound MINIMIZES the argument function,
we will instead pass in the negative of the Gamma PDF
in order to obtain the MAXIMUM.
"""
def f(x):
""" Gamma PDF is proportional to this function.
"""
return x**5 * np.e**-x
return -f(x)
from scipy.optimize import fminbound
lower_bound = 0.0
upper_bound = 20.0
ans = fminbound(h, lower_bound, upper_bound)
print('mode of Gamma(6, 1) is approximately {}'.format(ans))
mode of Gamma(6, 1) is approximately 5.000000043355882
If we had wanted to minimize instead of maximize, we could have put simply passed in f
to fminbound
without composing the nested function h
.
Next, let's do a discrete example of median and mode. An interesting fact about the $Bin(n, p)$ distribution is that if the mean $np$ is an integer, then the median and mode are also $np$ (even if the distribution is very skewed). To check this fact about the median for the $Bin(50, 0.2)$ distribution, we can use the following code.
from scipy.stats import binom
bin_cdf_vector = binom.cdf(np.arange(0,50), 50, 0.2)
median = np.argmax(bin_cdf_vector>=0.5)
print('median of Bin(50, 0.2) is {}'.format(median))
median of Bin(50, 0.2) is 10
The numpy.argmax
function finds the location of the maximum of a vector, giving the index of the first occurrence of a maximum. Since True
is encoded as 1 and False
is encoded as 0, the first maximum in binom.cdf(np.arange(0,50), 50, 0.2) >= 0.5
is at the first value for which the CDF is at least 0.5. The return value of numpy.argmax
on binom.cdf(np.arange(0,50), 50, 0.2)
is 10, showing that the median is at 10.
Similarly, numpy.argmax
on binom.pmf(np.arange(0,50), 50, 0.2)
returns 10, showing that the mode is at 10.
bin_pmf_vector = binom.pmf(np.arange(0,50), 50, 0.2)
mode = np.argmax(bin_pmf_vector)
print('mode of Bin(50, 0.2) is {}'.format(mode))
mode of Bin(50, 0.2) is 10
The sample median of a vector x of data can be found using numpy.median(x)
. scipy.stats.mode
can be used to find the sample mode, but it will only return the smallest mode in case there are ties. Instead, we can compose our own mode
function, using numpy.where
and numpy.bincount
.
np.random.seed(17711)
bin_rv_vector = binom.rvs(50, 0.2, size=50)
def mode(x):
m = np.bincount(x).max()
return np.where(np.bincount(bin_rv_vector)==m)[0]
ans = mode(bin_rv_vector)
print('mode(s) in r.v. vector x: {}'.format(ans))
mode(s) in r.v. vector x: [8]
In the starred Section 6.7, we showed that in rolling 6 fair dice, the probability of a total of 18 is 3431/66. But the proof was complicated. If we only need an approximate answer, simulation is a much easier approach. And we already know how to do it! Here is the code for a million repetitions:
np.random.seed(28657)
repetitions = 10**6
# counter to track number of 18s seen
c = 0
for _ in range(repetitions):
if np.random.choice(np.arange(1,7), 6).sum() == 18:
# increment the counter
c += 1
ans = c / repetitions
print('probability of total 18 with 6 fair dice (simulated) is {}'.format(ans))
probability of total 18 with 6 fair dice (simulated) is 0.073304
In our simulation this yielded 0.073304, which is very close to 3431/66 $\approx$ 0.07354.
© Blitzstein, Joseph K.; Hwang, Jessica. Introduction to Probability (Chapman & Hall/CRC Texts in Statistical Science).