This Jupyter notebook is the Python equivalent of the R code in section 6.9 R, pp. 267 - 272, Introduction to Probability, 2nd Edition, Blitzstein & Hwang.

In [1]:

```
import numpy as np
```

The MGF of an r.v. is a *function*. As an example of defining and working with functions in Python, let's use the $N(0, 1)$ MGF, which is given by $M(t) = e^{\frac{t^{2}}{2}}$. The code

In [2]:

```
def M(t):
""" Return N(0, 1) MGF evalated at t. """
return np.exp(t**2/2)
print('calling function M with a single value: \nM(0) = {}\n'.format(M(0)))
print('calling M with a vector: \nM(np.arange(1,11))) = {}'.format(M(np.arange(1,11))))
```

defines `M`

to be this function. The `def`

Python keyword says that we're defining a function. The function is named `M`

, and it takes one variable `t`

(called the argument or parameter of the function). The line declaring the function name and list of parameter(s) is terminated with a colon `:`

, with the body of the function following on the next line after an indent. Note that a simple Python function will not be able to flexibly deal with both single or vector inputs, such as is possible with a function in R. However, since our function `M`

relies on `numpy.exp`

, `M`

can accept both a nested sequence of objects or `numpy`

arrays as inputs, and return an single or tuple of `numpy`

array as output.

Writing

```
def M(x):
return np.exp(x**2/2)
```

would define the same function `M`

, except that now the parameter is named `x`

. Giving the parameters names is helpful for functions of more than one parameter, since Python then saves us from having to remember the order in which to write the parameters, and allows us to assign default values. For example, the $N(\mu, \sigma^2)$ MGF is given by $g(t) = exp\left(\mu \, t + \frac{\sigma^2 \, t^2}{2} \right)$, which depends on $t$, $mu$, and $\sigma$. We can define this in Python by

In [3]:

```
def g(t, mean=0, sd=1):
""" Return the N(mean, sd) MGF evaluated at t.
default mean (mu) = 0
default sd (sigma) = 1
"""
return np.exp(mean*t + (sd**2 * t**2)/2)
ans = g(1, 2, 3)
print('N(2, 3) MGF evaluated at 1 = {}'.format(ans))
```

What is `g(1, 2, 3)`

? It's the $N(2, 3^{2})$ MGF evaluated at 1, but it may be hard to remember which parameter is which, especially when working with many functions with many parameters over the course of many months. So we can also write `g(1, mean=2, sd=3)`

or `g(1, sd=3, mean=2)`

. Since the `mean`

and `sd`

function parameters have the form `parameter = expression`

, the function is said to have "default parameter values."

Also, when defining `g`

we specified default values of 0 for the `mean`

and 1 for the `sd`

standard deviation, so if we want the $N(0, 5^2)$ MGF evaluated at 3, we can use `g(3, sd=5)`

as shorthand. It would be bad here to write `g(3, 5)`

, since that is ambiguous about which argument is omitted; in fact, Python interprets this as `g(t3, mean=5)`

.

In [4]:

```
ans1 = g(1, mean=2, sd=3)
print('g(1, mean=2, sd=3) = {}\t... explicitly using parameter names'.format(ans1))
ans2 = g(3, 5)
print('g(3, 5) = {}\t\t... but which parameter was omitted?'.format(ans2))
ans3 = g(3, mean=5)
print('g(3, mean=5) = {}\t... \'mean\' parameter was omitted'.format(ans3))
```

LOTUS makes it easy to write down any moment of a continuous r.v. as an integral. The `scipy.integrate`

module in SciPy can help us do the integral numerically, using the `scipy.integrate.quad`

function.

In [5]:

```
from scipy.integrate import quad
# to learn more about scipy.integrate.quad, un-comment out the following line
#print(quad.__doc__)
```

For example, let's approximate the 6^{th} moment of a $N(0, 1)$ r.v. The code

In [6]:

```
from scipy.stats import norm
def g(x):
""" Return the 6th moment of N(0, 1). """
return x**6 * norm.pdf(x)
y, abserr = quad(g, -np.inf, np.inf)
print('6th moment of N(0,1) = {}, with error = {}'.format(y, abserr))
```

asks `quad`

to compute $\int_{-\infty}^{\infty} g(x) \, dx$, where $g(x) = x^6 \, \phi(x)$ with $\phi$ the $N(0, 1)$ PDF. When we ran this, `quad`

reported 15 (the correct answer, as we know from this chapter!) and that the absolute error was less than 4.423 $\times$ 10^{−9}.

☣ 6.9.1. Numerical integration runs into difficulties for some functions; as usual, checking answers in multiple ways is a good idea. Using `numpy.inf`

for parameter `b`

(the upper limit of integration) is preferred to using a large number as the upper limit when integrating up to $\infty$ (and likewise for a lower limit of `a = -numpy.inf`

for $-\infty$. For example, on many systems `quad(norm.pdf, 0, 10**6)`

reports 0.0 while `quad(norm.pdf, 0, numpy.inf)`

reports the correct answer, 0.5.

However, all of the continuous distributions supported in `scipy.stats`

have a `moment(n, loc=0, scale=1)`

function that allows you to quickly and easily obtain the n^{th} order non-central moment of the continuous distribution in question. For example:

In [7]:

```
print('norm.moment(n=6) = {}'.format(norm.moment(n=6)))
```

Similarly, to check that the 2^{nd} moment (and variance) of a $Unif(−1, 1)$ r.v. is 1/3, we can use `quad`

again

In [8]:

```
from scipy.stats import uniform
def h(x):
""" Return 2nd moment (var) of Unif(-1, 1).
scipy.stats.uniform is constant between
a = loc
b = loc + scale
"""
a = -1
b = 1
loc = a
scale = b - loc
return x**2 * uniform.pdf(x, loc=loc, scale=scale)
y, abserr = quad(h, -1, 1)
print('2nd moment of Unif(-1,1) = {}, with error = {}'.format(y, abserr))
```

Alternately, we can either use `uniform.moment(n=2, loc=-1, scale=2)`

or just `uniform.var(loc=-1, scale=2)`

, keeping in mind that `loc=-1`

and `scale = interval length = 2`

.

In [9]:

```
ans1 = uniform.moment(n=2, loc=-1, scale=2)
print('uniform.moment(n=2, loc=-1, scale=2) = {}'.format(ans1))
ans2 = uniform.var(loc=-1, scale=2)
print('uniform.var(loc=-1, scale=2) = {}'.format(ans2))
```

For moments of a discrete r.v., we can use LOTUS and the `numpy.sum`

function. For example, to find the 2^{nd} moment of $X \sim Pois(7)$, we can use

In [10]:

```
from scipy.stats import poisson
def g(k):
""" Return the 2nd moment of Pois(7) evaluated at k. """
return k**2 * poisson.pmf(k, 7)
# we want to sum up to and including 100, so the upper limit is 100+1
ans = np.sum(g(np.arange(0, 100+1)))
print('2nd moment of Pois(7) = {}'.format(ans))
```

Here we summed up to 100 since it’s clear after getting a sense of the terms that the total contribution of all the terms after k = 100 is negligible (choosing an upper limit in this way is in contrast to how to use the integrate command in the continuous case). The result is extremely close to 56, which is comforting since $E(X^{2}) = Var(X) + (EX)^{2} = 7 + 49 = 56$.

But similar to continous r.v., the discrete r.v. in `scipy.stats`

have a `moment`

function as well.

In [11]:

```
ans = poisson.moment(n=2, mu=7)
print('poisson.moment(n=2, mu=7) = {}'.format(ans))
```

A sample moment can be found in easily using NumPy. If `x`

is a vector of data, then `numpy.mean(x)`

gives its sample mean and, more generally, `numpy.mean(x**n)`

gives the n^{th} sample mean for any positive integer `n`

. For example,

In [12]:

```
# seed the random number generator
np.random.seed(6765)
x = norm.rvs(size=100)
print('sample moment: numpy.mean(x**6) = {}'.format(np.mean(x**6)))
```

gives the 6^{th} sample moment of 100 i.i.d. $N(0, 1)$ r.v.s. How close is it to the true 6^{th} moment? How close are other sample moments to the corresponding true moments?

The sample variance can also be found in easily with NumPy. If `x`

is a vector of data, then using the `ddof`

parameter (delta degrees of freedom) such as in `numpy.var(x, ddof=1)`

gives its sample variance. This returns `nan`

(not a number) as well as issuing `RuntimeWarning: Degrees of freedom <= 0 for slice`

(the divisor used in the calculation is `len(x) - ddof`

) if `x`

has length 1, since the $n − 1$ in the denominator is 0 in this case. It makes sense not to return a numerical value in this case, not only because of the definition but also because it would be insane to try to estimate the variability of a population if we only have one observation!

For a simple demonstration of using the sample mean and sample variance to estimate the true mean and true variance of a distribution, we generate 1000 times from a $N(0, 1)$ distribution and store the values in `z`

. We then compute the sample mean and sample variance with `numpy.mean(z)`

and `numpy.var(z, ddof=1)`

.

In [13]:

```
np.random.seed(10946)
z = norm.rvs(size=1000)
mu_z = np.mean(z)
print('sample mean of z: {}'.format(mu_z))
var_z = np.var(z, ddof=1)
print('sample variance of z: {}'.format(var_z))
```

We find that `numpy.mean(z)`

is close to 0 and `numpy.var(z, ddof=1)`

is close to 1. You can try this out for a $N(\mu, \sigma^2)$ distribution (or other distribution) of your choosing; just remember that `numpy.norm.rvs`

takes $\sigma$ as the `scale`

parameter, and not $\sigma^2$!

The sample standard deviation of `x`

can be found using `numpy.std(x, ddof=1)`

. This gives the same result as `numpy.sqrt(numpy.var(x, ddof=1))`

.

In [14]:

```
sd1 = np.std(z, ddof=1)
print('np.std(z, ddof=1) = {}'.format(sd1))
sd2 = np.sqrt(np.var(z, ddof=1))
print('np.sqrt(np.var(z, ddof=1)) = {}'.format(sd2))
```

While the `scipy.stats`

module does have functions for `skew`

and `kurtosis`

, both functions rely on the *population standard deviation* (with $n$ rather than $n-1$ in the denominator).

However, we can easily define our own functions for *sample skewness* and *sample kurtosis* by using `scipy.stats.moment`

and `numpy.std(z, ddof=1)`

.

In [15]:

```
from scipy.stats import moment
# to learn more about scipy.stats.moment, un-comment out the following line
#print(moment.__doc__)
```

In [16]:

```
def skew(x, use_sample_sd=True):
""" Return the skew of x.
Default is to use sample standard deviation on the denominator,
yielding sample skew.
Specifying use_sample_sd=False is the same as using
scipy.stats.skew(x).
"""
ddof = 1 if use_sample_sd==True else 0
return moment(x, 3) / np.std(x, ddof=ddof)**3
print('sample skew of z = {}'.format(skew(z)))
```

In [17]:

```
def kurt(x, use_sample_sd=True):
""" Return the excess kurtosis of x.
Default is to use sample standard deviation on the denominator,
yielding sample kurtosis.
Specifying use_sample_sd=False is the same as using
scipy.stats.kurtosis(x).
"""
ddof = 1 if use_sample_sd==True else 0
return moment(x, 4) / np.std(x, ddof=ddof)**4 - 3.0
print('sample kurtosis of z = {}'.format(kurt(z)))
```

To find the median of a continuous r.v. with CDF $F$, we need to solve the equation $F(x) = 1/2$ for $x$, which is equivalent to finding the root (zero) of the function $g$ given by $g(x) = F(x) − 1/2$. This can be done using `scipy.optimize.root`

in SciPy. For example, let's find the median of the Expo(1) distribution. The code

In [18]:

```
from scipy.stats import expon
from scipy.optimize import root
# to learn more about scipy.optimize.root, un-comment out the following line
#print(root.__doc__)
def g(x):
""" Assuming F(x) = Expo(1),
define a function g(x) = F(x) - 1/2
"""
return expon.pdf(x) - 1/2
# set our inital guess at a root to be 0
root(g, 0)
```

Out[18]:

asks `root`

to find a root of the desired function, with an initial guess of 0. This returns an answer very close to the true answer of $log(2) ≈ 0.693$. Of course, in this case we can solve $1 − e^{−x} = \frac{1}{2}$ directly without having to use numerical methods.

☣ 6.9.2. `scipy.optimize.root`

is useful but there is no guarantee that it will find a root (`success`

may be `False`

). When using `scipy.optmize.root`

please pay attention to the `status`

, `success`

and `message`

in the result returned.

An easier way to find the median of the $Expo(1)$ is to use `scipy.stats.expon.median`

. The `median`

function of returns the median of the distribution, for `expon`

as well as all other continuous distributions in `scipy.stats`

).

In [19]:

```
ans = expon.median()
print('expon.median() = {}'.format(ans))
```

For finding the mode of a continuous distribution, we can use the `fminbound`

function in SciPy. For example, let's find the mode of the $Gamma(6, 1)$ distribution, which is an important distribution that we will introduce in the next chapter. Its PDF is proportional to $x^{5} \, e^{−x}$. Using calculus, we can find that the mode is at $x = 5$. Using `scipy.optimize.fminbound`

, we can find that the mode is very close to $x = 5$ as follows.

In [20]:

```
def h(x):
""" Since fminbound MINIMIZES the argument function,
we will instead pass in the negative of the Gamma PDF
in order to obtain the MAXIMUM.
"""
def f(x):
""" Gamma PDF is proportional to this function.
"""
return x**5 * np.e**-x
return -f(x)
from scipy.optimize import fminbound
lower_bound = 0.0
upper_bound = 20.0
ans = fminbound(h, lower_bound, upper_bound)
print('mode of Gamma(6, 1) is approximately {}'.format(ans))
```

If we had wanted to *minimize* instead of *maximize*, we could have put simply passed in `f`

to `fminbound`

without composing the nested function `h`

.

Next, let's do a discrete example of median and mode. An interesting fact about the $Bin(n, p)$ distribution is that if the mean $np$ is an integer, then the median and mode are also $np$ (even if the distribution is very skewed). To check this fact about the median for the $Bin(50, 0.2)$ distribution, we can use the following code.

In [21]:

```
from scipy.stats import binom
bin_cdf_vector = binom.cdf(np.arange(0,50), 50, 0.2)
median = np.argmax(bin_cdf_vector>=0.5)
print('median of Bin(50, 0.2) is {}'.format(median))
```

The `numpy.argmax`

function finds the location of the maximum of a vector, giving the index of the first occurrence of a maximum. Since `True`

is encoded as 1 and `False`

is encoded as 0, the first maximum in `binom.cdf(np.arange(0,50), 50, 0.2) >= 0.5`

is at the first value for which the CDF is at least 0.5. The return value of `numpy.argmax`

on `binom.cdf(np.arange(0,50), 50, 0.2)`

is 10, showing that the median is at 10.

Similarly, `numpy.argmax`

on `binom.pmf(np.arange(0,50), 50, 0.2)`

returns 10, showing that the mode is at 10.

In [22]:

```
bin_pmf_vector = binom.pmf(np.arange(0,50), 50, 0.2)
mode = np.argmax(bin_pmf_vector)
print('mode of Bin(50, 0.2) is {}'.format(mode))
```

The sample median of a vector x of data can be found using `numpy.median(x)`

. `scipy.stats.mode`

can be used to find the sample mode, but it will only return the smallest mode in case there are ties. Instead, we can compose our own `mode`

function, using `numpy.where`

and `numpy.bincount`

.

In [23]:

```
np.random.seed(17711)
bin_rv_vector = binom.rvs(50, 0.2, size=50)
def mode(x):
m = np.bincount(x).max()
return np.where(np.bincount(bin_rv_vector)==m)[0]
ans = mode(bin_rv_vector)
print('mode(s) in r.v. vector x: {}'.format(ans))
```

In the starred Section 6.7, we showed that in rolling 6 fair dice, the probability of a total of 18 is 3431/6^{6}. But the proof was complicated. If we only need an approximate answer, simulation is a much easier approach. And we already know how to do it! Here is the code for a million repetitions:

In [24]:

```
np.random.seed(28657)
repetitions = 10**6
# counter to track number of 18s seen
c = 0
for _ in range(repetitions):
if np.random.choice(np.arange(1,7), 6).sum() == 18:
# increment the counter
c += 1
ans = c / repetitions
print('probability of total 18 with 6 fair dice (simulated) is {}'.format(ans))
```

In our simulation this yielded 0.073304, which is very close to 3431/6^{6} $\approx$ 0.07354.

Joseph K. Blitzstein and Jessica Hwang, Harvard University and Stanford University, © 2019 by Taylor and Francis Group, LLC