Description: Random number generation in Python. Inverse transform sampling. Monte Carlo simulations.
Has to do with three different fields:
A real-valued random variable is a function
$$ \tilde{X}: \Omega \rightarrow \mathbb{R} $$with the property that $ \{\omega \in \Omega: X(\omega) \leq x\} \in \mathfrak{A} $ for each $ x \in \mathbb{R} $. Such function is said to be $ \mathfrak{A} $-measurable
$ F(x) = P(\{\tilde{X} \le x\}) $ cumulative distribution function, stepwise constant, discontinuous
Realizations of a random variable $ \tilde{X} $ drawn independently one by one
$$ \{x_1,x_2,\dots,x_n\} \sim F_{\tilde{X}}(x) $$Every language has a rand() function that generates uniform number on $ [0,1] $
In fact, pseudo-random number generator, i.e. deterministic algorithm that returns a sequence of numbers that look close enough to being random
Use physical processes as origin for randomness (RANDOM.ORG, hardware RNGs)
Pseudo random number generator state may be initiated by true source of randomness
Modules:
x=random.random() uniform on $ [0,1] $
# Simulating a scalar random variable (one by one or in bulk)
import random
x = [random.random() for i in range(100)]
print(*x[:10],sep='\n')
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def hist(data,bins='auto',range=None,theoretical=None,cdf=False):
'''Draws histogram of data, imposes a theoretical distribution if given'''
fig, ax = plt.subplots(figsize=(10,6))
if cdf:
# plot CDF instead of histogram
plt.hist(data,bins=len(data),range=range,cumulative=True,density=True,align='right',histtype='step',color='black')
else:
plt.hist(data,bins=bins,range=range,density=True,histtype='bar',color='lightgrey',edgecolor='k')
if theoretical and len(data)>0:
# add theoretical distribution
x = (np.linspace(range[0],range[-1],100) if range else np.linspace(min(data),max(data),100))
y = theoretical(x)
plt.plot(x,y,'r-')
import scipy.stats
hist(x,theoretical=scipy.stats.uniform.pdf)
for i in range(1,6):
n=10**i
data = np.random.random(n) #NumPy
hist(data,bins=25,theoretical=scipy.stats.uniform.pdf)
plt.title('%d realizations'%n)
and many other
https://numpy.org/doc/stable/reference/random/generator.html
for i in range(1,6):
n=10**i
data = np.random.lognormal(size=n)
hist(data,bins=25,range=(0,5),theoretical=lambda x: scipy.stats.lognorm.pdf(x,1.0))
plt.title('%d realizations'%n)
Let $ F(x) $ be cdf of the random variable of interest $ \tilde{X} $, with inverse $ F^{(-1)}(x) $
To simulate $ \tilde{X} $:
Let $ \tilde{X} = F^{(-1)}(\tilde{U}) $, and denote its cdf $ F_{\tilde{X}}(x) $.
Also let $ F_{U}(x) = \min(\max(0,U),1) $ denote cdf of standard uniform distribution. Then
$$ F_{\tilde{X}}(x) = P(\{ \tilde{X} \le x \}) = P(\{ F^{(-1)}(\tilde{U}) \le x \}) = $$$$ = P(\{ \tilde{U} \le F(x) \}) = F_U \big(F(x)\big) = F(x) $$seed()
get_state()
set_state()
Solving deterministic problems using random numbers
Example: Compute $ \pi $
Approach:
def pimc(n=100):
'''Compute pi using Monte Carlo with sample size n'''
d = np.random.uniform(size=(2,int(n)))
d2 = np.sum(d**2,axis=0)
n1 = np.sum(d2<1)
s4 = n1/n
return 4*s4
d = pimc(n=1e6)
print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))
data=[]
for i in range(8):
n = 10**i
d = pimc(n)
print('Estimate of pi is %1.5f, %1.0e points, bias %1.3e'%(d,n,d-np.pi))
data=[]
for i in range(1000):
data.append(pimc(n=10000))
hist(data)
d = np.mean(data)
print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))
Convergence is slow
Fitted for some tasks better than for other
Let $ f(x) = x ^ k $ and $ g(x) = 1 + \frac{1}{k}\cos(2 \pi x) $.
Proposition: $ g(x)>f(x) $ on $ [0,1] $ for all $ k \ge 1.5 $?
More broadly, what are the values of $ k $ when the inequality holds?
x = np.linspace(0,1,1000).reshape([-1,1]) # space for check points on [0,1], column
N = 100 # number of random draws of parameters
k = np.random.uniform(low=1.5,high=10,size=N) # generate k
f = x ** k
g = 1 + np.cos(x*2*np.pi) / k
# plot the functions
fig, ax = plt.subplots()
ax.plot(x,f,color='b',alpha=0.5)
ax.plot(x,g,color='r',alpha=0.5)
plt.show()
check = np.array(np.all(g>f,axis=0)) # simulate the proposition
fk = k[np.logical_not(check)]
answer = np.all(check)
print('The simulated proposition is:',answer)
if not answer:
print('Failing values of k:',fk,sep=' ')
# plot the functions
fig, ax = plt.subplots()
ax.scatter(k[check],np.ones(k[check].shape),color='g')
ax.scatter(fk,np.ones(fk.shape),color='r')
plt.show()
Conclusion: Even though with relatively low $ N=100 $ the statement may seem to be correct (depending on the lowest realized $ k $), running more simulations, for example, $ N=500 $, reveals that there are values of $ k $ close to 1.5 where the statement is incorrect. Running even more simulations may help to establish an approximate threshold somewhat above 1.5, such that for $ k $ above the threshold, the inequality is true.