Chapter 6: Moments

This Jupyter notebook is the Python equivalent of the R code in section 6.9 R, pp. 267 - 272, Introduction to Probability, 2nd Edition, Blitzstein & Hwang.


In [1]:
import numpy as np

Functions

The MGF of an r.v. is a function. As an example of defining and working with functions in Python, let's use the $N(0, 1)$ MGF, which is given by $M(t) = e^{\frac{t^{2}}{2}}$. The code

In [2]:
def M(t):
    """ Return N(0, 1) MGF evalated at t. """
    return np.exp(t**2/2)

print('calling function M with a single value: \nM(0) = {}\n'.format(M(0)))

print('calling M with a vector: \nM(np.arange(1,11))) = {}'.format(M(np.arange(1,11))))
calling function M with a single value: 
M(0) = 1.0

calling M with a vector: 
M(np.arange(1,11))) = [1.64872127e+00 7.38905610e+00 9.00171313e+01 2.98095799e+03
 2.68337287e+05 6.56599691e+07 4.36731791e+10 7.89629602e+13
 3.88084696e+17 5.18470553e+21]

defines M to be this function. The def Python keyword says that we're defining a function. The function is named M, and it takes one variable t (called the argument or parameter of the function). The line declaring the function name and list of parameter(s) is terminated with a colon :, with the body of the function following on the next line after an indent. Note that a simple Python function will not be able to flexibly deal with both single or vector inputs, such as is possible with a function in R. However, since our function M relies on numpy.exp, M can accept both a nested sequence of objects or numpy arrays as inputs, and return an single or tuple of numpy array as output.

Writing

def M(x):
    return np.exp(x**2/2)

would define the same function M, except that now the parameter is named x. Giving the parameters names is helpful for functions of more than one parameter, since Python then saves us from having to remember the order in which to write the parameters, and allows us to assign default values. For example, the $N(\mu, \sigma^2)$ MGF is given by $g(t) = exp\left(\mu \, t + \frac{\sigma^2 \, t^2}{2} \right)$, which depends on $t$, $mu$, and $\sigma$. We can define this in Python by

In [3]:
def g(t, mean=0, sd=1):
    """ Return the N(mean, sd) MGF evaluated at t.
        default mean (mu) = 0
        default sd (sigma) = 1
    """
    return np.exp(mean*t + (sd**2 * t**2)/2)

ans = g(1, 2, 3)
print('N(2, 3) MGF evaluated at 1 = {}'.format(ans))
N(2, 3) MGF evaluated at 1 = 665.1416330443618

What is g(1, 2, 3)? It's the $N(2, 3^{2})$ MGF evaluated at 1, but it may be hard to remember which parameter is which, especially when working with many functions with many parameters over the course of many months. So we can also write g(1, mean=2, sd=3) or g(1, sd=3, mean=2). Since the mean and sd function parameters have the form parameter = expression, the function is said to have "default parameter values."

Also, when defining g we specified default values of 0 for the mean and 1 for the sd standard deviation, so if we want the $N(0, 5^2)$ MGF evaluated at 3, we can use g(3, sd=5) as shorthand. It would be bad here to write g(3, 5), since that is ambiguous about which argument is omitted; in fact, Python interprets this as g(t3, mean=5).

In [4]:
ans1 = g(1, mean=2, sd=3)
print('g(1, mean=2, sd=3) = {}\t... explicitly using parameter names'.format(ans1))

ans2 = g(3, 5)
print('g(3, 5) = {}\t\t... but which parameter was omitted?'.format(ans2))

ans3 = g(3, mean=5)
print('g(3, mean=5) = {}\t... \'mean\' parameter was omitted'.format(ans3))
g(1, mean=2, sd=3) = 665.1416330443618	... explicitly using parameter names
g(3, 5) = 294267566.0415088		... but which parameter was omitted?
g(3, mean=5) = 294267566.0415088	... 'mean' parameter was omitted

Moments

LOTUS makes it easy to write down any moment of a continuous r.v. as an integral. The scipy.integrate module in SciPy can help us do the integral numerically, using the scipy.integrate.quad function.

In [5]:
from scipy.integrate import quad

# to learn more about scipy.integrate.quad, un-comment out the following line
#print(quad.__doc__)

For example, let's approximate the 6th moment of a $N(0, 1)$ r.v. The code

In [6]:
from scipy.stats import norm

def g(x):
    """ Return the 6th moment of N(0, 1). """
    return x**6 * norm.pdf(x)

y, abserr = quad(g, -np.inf, np.inf)
print('6th moment of N(0,1) = {}, with error = {}'.format(y, abserr))
6th moment of N(0,1) = 15.000000000000004, with error = 4.4229319318339374e-09

asks quad to compute $\int_{-\infty}^{\infty} g(x) \, dx$, where $g(x) = x^6 \, \phi(x)$ with $\phi$ the $N(0, 1)$ PDF. When we ran this, quad reported 15 (the correct answer, as we know from this chapter!) and that the absolute error was less than 4.423 $\times$ 10−9.

☣ 6.9.1. Numerical integration runs into difficulties for some functions; as usual, checking answers in multiple ways is a good idea. Using numpy.inf for parameter b (the upper limit of integration) is preferred to using a large number as the upper limit when integrating up to $\infty$ (and likewise for a lower limit of a = -numpy.inf for $-\infty$. For example, on many systems quad(norm.pdf, 0, 10**6) reports 0.0 while quad(norm.pdf, 0, numpy.inf) reports the correct answer, 0.5.

However, all of the continuous distributions supported in scipy.stats have a moment(n, loc=0, scale=1) function that allows you to quickly and easily obtain the nth order non-central moment of the continuous distribution in question. For example:

In [7]:
print('norm.moment(n=6) = {}'.format(norm.moment(n=6)))
norm.moment(n=6) = 15.00000000089533

Similarly, to check that the 2nd moment (and variance) of a $Unif(−1, 1)$ r.v. is 1/3, we can use quad again

In [8]:
from scipy.stats import uniform

def h(x):
    """ Return 2nd moment (var) of Unif(-1, 1).
    
        scipy.stats.uniform is constant between
            a = loc
            b = loc + scale
    """
    a = -1
    b = 1
    loc = a
    scale = b - loc
    return x**2 * uniform.pdf(x, loc=loc, scale=scale)

y, abserr = quad(h, -1, 1)
print('2nd moment of Unif(-1,1) = {}, with error = {}'.format(y, abserr))
2nd moment of Unif(-1,1) = 0.3333333333333333, with error = 3.700743415417188e-15

Alternately, we can either use uniform.moment(n=2, loc=-1, scale=2) or just uniform.var(loc=-1, scale=2), keeping in mind that loc=-1 and scale = interval length = 2.

In [9]:
ans1 = uniform.moment(n=2, loc=-1, scale=2)
print('uniform.moment(n=2, loc=-1, scale=2) = {}'.format(ans1))

ans2 = uniform.var(loc=-1, scale=2)
print('uniform.var(loc=-1, scale=2) = {}'.format(ans2))
uniform.moment(n=2, loc=-1, scale=2) = 0.33333333333333326
uniform.var(loc=-1, scale=2) = 0.3333333333333333

For moments of a discrete r.v., we can use LOTUS and the numpy.sum function. For example, to find the 2nd moment of $X \sim Pois(7)$, we can use

In [10]:
from scipy.stats import poisson

def g(k):
    """ Return the 2nd moment of Pois(7) evaluated at k. """
    return k**2 * poisson.pmf(k, 7)

# we want to sum up to and including 100, so the upper limit is 100+1
ans = np.sum(g(np.arange(0, 100+1)))
print('2nd moment of Pois(7) = {}'.format(ans))
2nd moment of Pois(7) = 55.99999999999994

Here we summed up to 100 since it’s clear after getting a sense of the terms that the total contribution of all the terms after k = 100 is negligible (choosing an upper limit in this way is in contrast to how to use the integrate command in the continuous case). The result is extremely close to 56, which is comforting since $E(X^{2}) = Var(X) + (EX)^{2} = 7 + 49 = 56$.

But similar to continous r.v., the discrete r.v. in scipy.stats have a moment function as well.

In [11]:
ans = poisson.moment(n=2, mu=7)
print('poisson.moment(n=2, mu=7) = {}'.format(ans))
poisson.moment(n=2, mu=7) = 56

A sample moment can be found in easily using NumPy. If x is a vector of data, then numpy.mean(x) gives its sample mean and, more generally, numpy.mean(x**n) gives the nth sample mean for any positive integer n. For example,

In [12]:
# seed the random number generator
np.random.seed(6765)

x = norm.rvs(size=100)
print('sample moment: numpy.mean(x**6) = {}'.format(np.mean(x**6)))
sample moment: numpy.mean(x**6) = 9.336344243424868

gives the 6th sample moment of 100 i.i.d. $N(0, 1)$ r.v.s. How close is it to the true 6th moment? How close are other sample moments to the corresponding true moments?

The sample variance can also be found in easily with NumPy. If x is a vector of data, then using the ddof parameter (delta degrees of freedom) such as in numpy.var(x, ddof=1) gives its sample variance. This returns nan (not a number) as well as issuing RuntimeWarning: Degrees of freedom <= 0 for slice (the divisor used in the calculation is len(x) - ddof) if x has length 1, since the $n − 1$ in the denominator is 0 in this case. It makes sense not to return a numerical value in this case, not only because of the definition but also because it would be insane to try to estimate the variability of a population if we only have one observation!

For a simple demonstration of using the sample mean and sample variance to estimate the true mean and true variance of a distribution, we generate 1000 times from a $N(0, 1)$ distribution and store the values in z. We then compute the sample mean and sample variance with numpy.mean(z) and numpy.var(z, ddof=1).

In [13]:
np.random.seed(10946)

z = norm.rvs(size=1000)

mu_z = np.mean(z)
print('sample mean of z: {}'.format(mu_z))

var_z = np.var(z, ddof=1)
print('sample variance of z: {}'.format(var_z))
sample mean of z: 0.00915384129813555
sample variance of z: 1.073340109262066

We find that numpy.mean(z) is close to 0 and numpy.var(z, ddof=1) is close to 1. You can try this out for a $N(\mu, \sigma^2)$ distribution (or other distribution) of your choosing; just remember that numpy.norm.rvs takes $\sigma$ as the scale parameter, and not $\sigma^2$!

The sample standard deviation of x can be found using numpy.std(x, ddof=1). This gives the same result as numpy.sqrt(numpy.var(x, ddof=1)).

In [14]:
sd1 = np.std(z, ddof=1)
print('np.std(z, ddof=1) = {}'.format(sd1))

sd2 = np.sqrt(np.var(z, ddof=1))
print('np.sqrt(np.var(z, ddof=1)) = {}'.format(sd2))
np.std(z, ddof=1) = 1.0360212880351765
np.sqrt(np.var(z, ddof=1)) = 1.0360212880351765

While the scipy.stats module does have functions for skew and kurtosis, both functions rely on the population standard deviation (with $n$ rather than $n-1$ in the denominator).

However, we can easily define our own functions for sample skewness and sample kurtosis by using scipy.stats.moment and numpy.std(z, ddof=1).

In [15]:
from scipy.stats import moment

# to learn more about scipy.stats.moment, un-comment out the following line
#print(moment.__doc__)
In [16]:
def skew(x, use_sample_sd=True):
    """ Return the skew of x.
    
        Default is to use sample standard deviation on the denominator,
        yielding sample skew. 
        
        Specifying use_sample_sd=False is the same as using
        scipy.stats.skew(x).
    """
    ddof = 1 if use_sample_sd==True else 0
    return moment(x, 3) / np.std(x, ddof=ddof)**3

print('sample skew of z = {}'.format(skew(z)))
sample skew of z = 0.02333931072753422
In [17]:
def kurt(x, use_sample_sd=True):
    """ Return the excess kurtosis of x.
    
        Default is to use sample standard deviation on the denominator,
        yielding sample kurtosis. 
        
        Specifying use_sample_sd=False is the same as using
        scipy.stats.kurtosis(x).
    """
    ddof = 1 if use_sample_sd==True else 0
    return moment(x, 4) / np.std(x, ddof=ddof)**4 - 3.0

print('sample kurtosis of z = {}'.format(kurt(z)))
sample kurtosis of z = 0.16247912720035718

Medians and modes

To find the median of a continuous r.v. with CDF $F$, we need to solve the equation $F(x) = 1/2$ for $x$, which is equivalent to finding the root (zero) of the function $g$ given by $g(x) = F(x) − 1/2$. This can be done using scipy.optimize.root in SciPy. For example, let's find the median of the Expo(1) distribution. The code

In [18]:
from scipy.stats import expon
from scipy.optimize import root

# to learn more about scipy.optimize.root, un-comment out the following line
#print(root.__doc__)

def g(x):
    """ Assuming F(x) = Expo(1),
        define a function g(x) = F(x) - 1/2
    """
    return expon.pdf(x) - 1/2

# set our inital guess at a root to be 0
root(g, 0)
Out[18]:
    fjac: array([[-1.]])
     fun: array([-5.55111512e-17])
 message: 'The solution converged.'
    nfev: 9
     qtf: array([-1.56348268e-11])
       r: array([0.5000001])
  status: 1
 success: True
       x: array([0.69314718])

asks root to find a root of the desired function, with an initial guess of 0. This returns an answer very close to the true answer of $log(2) ≈ 0.693$. Of course, in this case we can solve $1 − e^{−x} = \frac{1}{2}$ directly without having to use numerical methods.

☣ 6.9.2. scipy.optimize.rootis useful but there is no guarantee that it will find a root (success may be False). When using scipy.optmize.root please pay attention to the status, success and message in the result returned.

An easier way to find the median of the $Expo(1)$ is to use scipy.stats.expon.median. The median function of returns the median of the distribution, for expon as well as all other continuous distributions in scipy.stats).

In [19]:
ans = expon.median()
print('expon.median() = {}'.format(ans))
expon.median() = 0.6931471805599453

For finding the mode of a continuous distribution, we can use the fminbound function in SciPy. For example, let's find the mode of the $Gamma(6, 1)$ distribution, which is an important distribution that we will introduce in the next chapter. Its PDF is proportional to $x^{5} \, e^{−x}$. Using calculus, we can find that the mode is at $x = 5$. Using scipy.optimize.fminbound, we can find that the mode is very close to $x = 5$ as follows.

In [20]:
def h(x):
    """ Since fminbound MINIMIZES the argument function,
        we will instead pass in the negative of the Gamma PDF
        in order to obtain the MAXIMUM.
    """
    def f(x):
        """ Gamma PDF is proportional to this function.
        """
        return x**5 * np.e**-x
    return -f(x)

from scipy.optimize import fminbound

lower_bound = 0.0
upper_bound = 20.0

ans = fminbound(h, lower_bound, upper_bound)
print('mode of Gamma(6, 1) is approximately {}'.format(ans))
mode of Gamma(6, 1) is approximately 5.000000043355882

If we had wanted to minimize instead of maximize, we could have put simply passed in f to fminbound without composing the nested function h.

Next, let's do a discrete example of median and mode. An interesting fact about the $Bin(n, p)$ distribution is that if the mean $np$ is an integer, then the median and mode are also $np$ (even if the distribution is very skewed). To check this fact about the median for the $Bin(50, 0.2)$ distribution, we can use the following code.

In [21]:
from scipy.stats import binom

bin_cdf_vector = binom.cdf(np.arange(0,50), 50, 0.2)
median = np.argmax(bin_cdf_vector>=0.5)
print('median of Bin(50, 0.2) is {}'.format(median))
median of Bin(50, 0.2) is 10

The numpy.argmax function finds the location of the maximum of a vector, giving the index of the first occurrence of a maximum. Since True is encoded as 1 and False is encoded as 0, the first maximum in binom.cdf(np.arange(0,50), 50, 0.2) >= 0.5 is at the first value for which the CDF is at least 0.5. The return value of numpy.argmax on binom.cdf(np.arange(0,50), 50, 0.2) is 10, showing that the median is at 10.

Similarly, numpy.argmax on binom.pmf(np.arange(0,50), 50, 0.2) returns 10, showing that the mode is at 10.

In [22]:
bin_pmf_vector = binom.pmf(np.arange(0,50), 50, 0.2)
mode = np.argmax(bin_pmf_vector)
print('mode of Bin(50, 0.2) is {}'.format(mode))
mode of Bin(50, 0.2) is 10

The sample median of a vector x of data can be found using numpy.median(x). scipy.stats.mode can be used to find the sample mode, but it will only return the smallest mode in case there are ties. Instead, we can compose our own mode function, using numpy.where and numpy.bincount.

In [23]:
np.random.seed(17711)

bin_rv_vector = binom.rvs(50, 0.2, size=50)

def mode(x):
    m = np.bincount(x).max()
    return np.where(np.bincount(bin_rv_vector)==m)[0]

ans = mode(bin_rv_vector)
print('mode(s) in r.v. vector x: {}'.format(ans))
mode(s) in r.v. vector x: [8]

Dice simulation

In the starred Section 6.7, we showed that in rolling 6 fair dice, the probability of a total of 18 is 3431/66. But the proof was complicated. If we only need an approximate answer, simulation is a much easier approach. And we already know how to do it! Here is the code for a million repetitions:

In [24]:
np.random.seed(28657)

repetitions = 10**6
# counter to track number of 18s seen
c = 0

for _ in range(repetitions):
    if np.random.choice(np.arange(1,7), 6).sum() == 18:
        # increment the counter
        c += 1

ans = c / repetitions
print('probability of total 18 with 6 fair dice (simulated) is {}'.format(ans))
probability of total 18 with 6 fair dice (simulated) is 0.073304

In our simulation this yielded 0.073304, which is very close to 3431/66 $\approx$ 0.07354.


Joseph K. Blitzstein and Jessica Hwang, Harvard University and Stanford University, © 2019 by Taylor and Francis Group, LLC