sessionInfo()
R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] fansi_1.0.4 crayon_1.5.2 digest_0.6.31 utf8_1.2.2 [5] IRdisplay_1.1 repr_1.1.5 lifecycle_1.0.3 jsonlite_1.8.4 [9] evaluate_0.20 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 [13] uuid_1.1-0 vctrs_0.5.2 IRkernel_1.3.2 tools_4.2.2 [17] glue_1.6.2 fastmap_1.1.0 compiler_4.2.2 base64enc_0.1-3 [21] pbdZMQ_0.3-9 htmltools_0.5.4
library(plot3D)
library(ggplot2)
#library(magick)
library(animation)
To generate $U_i \stackrel{iid}{\sim} \text{unif}(0, 1)$.
Modulus $m$ is a large prime number.
Multiplier $a$ is a positive integer between 2 and $m-1$.
Map $f: x \mapsto ax \mod m$ maps $\{1, \dotsc, m-1\}$ onto itself and is one-to-one:
Note
Hence if $a^n = 1 \mod m$ for some $n$, then $$ x_n = x_0 \mod m $$ and the (pseudo)random number generator repeats the sequence. The number $n$ is called the period of the RNG, and $x_0$ is called the seed.
set.seed(2020) # set 2020th seed; does not mean x_0 = 2020
runif(5)
set.seed(2020) # same seed results in same "random" sequence
runif(5)
2,147,483,646
. This RNG was used in early versions of MATLAB (up to v4).Good RNGs should have long periods, and should give samples which appear to be drawn from a uniform distribution. If the size of the sample is much less than the period of the RNG, then the sample should appear random.
# returns a congruential generator with fixed $a$ and $m$
congruential_gen <- function(a, m, seed = 1) {
# initial seed for g
if (!hasArg(seed)) seed <- (as.numeric(Sys.time()) * 1000) %% m
g <- function(n) {
u <- vector(length=n)
u[1] <- seed
for (i in seq_len(n-1)) {
u[i+1] <- (a * u[i]) %% m
}
seed <<- (a * seed) %% m # seed update
u / m
}
g
}
Ideally, a pseudorandom number sequence should look i.i.d. If we take the first $p$ values to form an $p$-dimensional vector, the second $p$ values to form a second $p$-dimensional vector, etc, then these vectors should fill the $p$-dimensional hypercube uniformly.
However, by construction the sequence generated by congruential generators depends on the previous value, hence tends to be correlated.
It is known that congruential generators tend to give $p$-dimensional vectors that concentrate lower-dimensional hyperplanes, for some $p$.
# IBM System/360 RANDU generator
# a = 2^16 + 3 = 65539
# m = 2^31
RANDU <- congruential_gen(65539, 2^31) #, seed=1)
RANDU(10)
n <- c(10, 50, 100, 500, 1000, 2000, 5000, 10000, 20000)
saveGIF({
for (i in 1:length(n)) {
u <- RANDU(n[i])
x <- u[1:(n[i]-2)]
y <- u[2:(n[i]-1)]
z <- u[3:(n[i])]
scatter3D(x, y, z, colvar = NULL, pch=20, cex = 0.5, theta=160, main = paste('n = ', n[i]))
}
}, movie.name = 'RANDU.gif')
Output at: RANDU.gif
# Early MATLAB RNG
# a = 7^5
# m = 2^31 - 1
MATLABRNG <- congruential_gen(7^5, 2^31 - 1)
MATLABRNG(10)
saveGIF({
for (i in 1:length(n)) {
u <- MATLABRNG(n[i])
x <- u[1:(n[i]-2)]
y <- u[2:(n[i]-1)]
z <- u[3:(n[i])]
scatter3D(x, y, z, colvar = NULL, pch=20, cex = 0.5, theta=160, main = paste('n = ', n[i]))
}
}, movie.name = 'MATLABRNG.gif')
Output at: MATLABRNG.gif
A simple modification is to introduce shuffling in the sequence, which we won't cover in detail.
R uses the Mersenne-Twister as the default RNG. This RNG was developed by Matsumoto and Nishimura in 1998, and is the first algorithm whose period ($2^{19937} - 1$) exceeds the number of electron spin changes since the creation of the Universe ($10^{6000}$ against $10^{120}$)!
Mersenne-Twister guarantees 623 consecutive dimensions to be equidistributed (over the whole period).
RNGkind()
From now on, we assume the the problem of generating uniform random numbers has been solved for practical purposes.
For a random variable $X$, let $F$ be its cumulative distribution function (CDF), that is, $F(x) = P(X \le x)$. Recall that $F$ is right-continuous and nondecreasing. Also, if $F$ is strictrly increasing, random variable $F(X)$ is uniformly distributed on $[0, 1]$. Below, we generalize this result.
The inverse CDF of $X$ is defined as $$ F^{-1}(u) = \inf\{x: F(x) \ge u\}, \quad 0 < u < 1 , $$ which coincides with the usual inverse of $F$ if $F$ is strictly increasing.
Proposition 1. Let $X$ be a random variable with CDF $F$. Then the following holds.
Proof.
for any $t$. Suppose for now this is true.
Let $u \in (0, 1)$. Then by continuity of $F$, there is $y$ such that $F(y) = u$. By (*),
$$
P[F(X) \le u] = P[F(X) \le F(y)] = F(y) = u.
$$
Part 3: It suffices to show that $\{F^{-1}(U) \le t\} = \{U \le F(t)\}$. If $F^{-1}(u)=\inf\{x: F(x) \ge u\} \le t$, then by the monotonicity and right-continuity of $F$, the set $\{x: F(x) \ge u\}$ is an half-closed interval containing its left endpoint, which is $F^{-1}(u)$. Hence $F(F^{-1}(u)) \ge u$. Since $F^{-1}(u) \le t$, again by monotonicity of $F$, it follows that $u \le F(F^{-1}(u)) \le F(t)$. Conversely, if $u \le F(t)$, then by definition $F^{-1}(u) \le t$.
Part 2: by part 3, $X\stackrel{d}{=}F^{-1}(U)$, where $U$ is uniform on $[0, 1]$. Since $x=F^{-1}(u)$ implies $u \le F(x)$, it follows from $X\stackrel{d}{=}F^{-1}(U)$ that $U \le F(X)$ almost surely. Then,
$\{F(X) \leq y \} \subset \{U \leq y\}$. Therefore, $$ P[F(X) \le y] \le P(U \le y) = y. $$
So, $$ \{F(X) \le F(t) \} = \{X \le t\} \cup [\{X > t\} \cap \{F(X) = F(t)\}] =: A \cup B . $$ Obviously events $A$ and $B$ are disjoint. However, event $B$ corresponds to the values $x$ of $X$ with which $F(x)$ is constant. Hence $P(B)=\int_B dF(x) = 0$. Therefore, $$ P[F(X) \le F(t)] = P(A) + P(B) = P(A) = P(X \le t) = F(t). $$
# Exp(1)
n <- 1000
u <- runif(n)
x <- -log(u)
expdata <- data.frame(x)
plt <- ggplot(expdata, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
Warning message:
“The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.”
From the density of the Cauchy distribution $$ f(x) = \frac{\beta}{\pi(\beta^2 + (x-\mu)^2)}, $$ its CDF is $F(x) = \frac{1}{2} + \frac{1}{\pi}\tan^{-1}\left(\frac{x-\mu}{\beta}\right)$. Thus $$ F^{-1}(u) = \mu + \beta\tan(\pi u - \pi/2) = \mu - \frac{\beta}{\tan(\pi u)} . $$
# standard Cauchy (beta=1, mu=0)
n <- 1000
u <- runif(n)
x <- -1/tan(pi * u)
#hist(x, breaks=40)
cauchydata <- data.frame(x)
plt <- ggplot(cauchydata, aes(x=x)) +
geom_histogram(binwidth=10, fill="white", color="black") +
geom_density(aes(y=10 * ..count..), color="purple") +
xlim(-200, 200)
print(plt)
Warning message: “Removed 4 rows containing non-finite values (`stat_bin()`).” Warning message: “Removed 4 rows containing non-finite values (`stat_density()`).” Warning message: “Removed 2 rows containing missing values (`geom_bar()`).”
$X \sim \text{unif}(\{1, 2, \dotsc, k\})$. It is easy to verify $F(x) = \frac{1}{k}\lfloor x \rfloor$ for $x \in [0, n]$ and $F^{-1}(u)=\lceil ku \rceil$.
k <- 10
n <- 1000
u <- runif(n)
x <- ceiling(k * u)
table(x)
x 1 2 3 4 5 6 7 8 9 10 108 108 92 93 97 85 94 118 108 97
If $X \sim \text{geom}(p)$, then its probability mass functin $p(x) = (1-p)^{x-1}p$.
For $Y \sim \text{Exp}(\lambda)$, \begin{align*} P(\lceil Y \rceil = k) &= P(k-1 < Y \le k) = F_Y(k) - F_Y(k-1) = (1 - e^{-\lambda k}) - (1 - e^{-\lambda(k-1)}) \\ &= e^{-\lambda(k-1)}(1 - e^{-\lambda}) \\ &= (1 - p)^{k-1} p \end{align*} if $\lambda$ satisfies $p = 1 - e^{-\lambda}$, or $\lambda = -\log(1-p)$.
For this $\lambda$, $X = \lceil Y \rceil = \lceil -\frac{1}{\lambda}\log U \rceil = \lceil \frac{\log U}{\log(1-p)}\rceil$.
gengeom <- function(p, nsamp=1) {
u <- runif(nsamp)
y <- log(u) / log(1 - p)
ceiling(y)
}
nsamp <- 1000
p <- 0.3
x <- gengeom(p, nsamp)
geomdata <- data.frame(x)
plt <- ggplot(geomdata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)
For $X \sim N(0, 1)$, inverse CDF $\Phi^{-1}$ does not have a closed form.
Generates $X, Y \stackrel{iid}{\sim} N(0, 1)$.
Transforms the random Cartesian coordinates $(X, Y)$ to polar coorinates $(R, \Theta)$. Since $(X, Y)=(R\cos\Theta, R\sin\Theta)$, $$ \iint_A f_{XY}(x, y)dxdy = \iint_A \frac{1}{2\pi}\exp\left(-\frac{x^2 + y^2}{2}\right)dxdy = \iint_A \frac{1}{2\pi}\exp\left(-\frac{r^2}{2}\right)rdrd\theta . $$
Hence $R$ has density $f_R(r) = r\exp(-\frac{r^2}{2})$ and $\Theta$ is uninform on $[0, 2\pi]$. Since $$ P(R > \rho) = P(R^2 > \rho^2) = \int_\rho^{\infty} r\exp\left(-\frac{r^2}{2}\right)dr = \exp\left(-\frac{1}{2}\rho^2\right), $$ random variable $R^2$ is exponentially distributed with $\lambda = 1/2$.
Thus for independent $U, V \sim \text{unif}(0, 1)$, set $$ R = (-2\log U)^{1/2}, \quad \Theta = 2\pi V . $$ Then $(X, Y) = (R\cos\Theta, R\sin\Theta)$ are independently $N(0, 1)$.
boxmuller <- function(nsamp) {
n <- ceiling(nsamp / 2)
u <- runif(n)
v <- runif(n)
r <- sqrt(-2 * log(u))
theta <- 2 * pi * v
x <- r * cos(theta)
y <- r * sin(theta)
samp <- c(x, y)
samp[seq_len(nsamp)]
}
#hist(c(x, y))
n <- 1000
normdata1 <- data.frame(x = boxmuller(n))
plt <- ggplot(normdata1, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
From the Box-Muller transformation, we learned that if $R^2 \sim \text{Exp}(1/2)$ and $\Theta\sim\text{unif}(0, 2\pi)$ and they are independent, $(R\cos\Theta, R\sin\Theta)$ are i.i.d. standard normal.
On the other hand, if a random point $(U, V)$ is uniformly distributed on the unit disk, and let $(U, V) = (T\cos\Phi, T\sin\Phi)$, then
which means that $T^2 = U^2 + V^2 \sim \text{unif}(0, 1)$, $\Phi \sim \text{unif}(0, 2\pi)$, and they are independent.
then $X, Y$ are i.i.d. standard normal.
One way to sample from the unit disk is to sample $(U, V)$ from $\text{unif}[-1, 1]\times\text{unif}[-1, 1]$, and discard the sample if $U^2 + V^2 > 1$ and resample (see acceptance-rejection sampling below).
Algorithm:
This method avoids the trigonometric function evaluations of the Box-Muller, but uses $4/\pi$ as many random pairs on average.
marsaglia <- function(nsamp) {
n <- ceiling(nsamp / 2)
it <- 0
x <- numeric(n)
y <- numeric(n)
while (it < n) {
u <- 2 * runif(1) - 1
v <- 2 * runif(1) - 1
tau <- sqrt(u^2 + v^2)
if (tau > 1) next
x[it] <- sqrt(-4 * log(tau)) * u / tau
y[it] <- sqrt(-4 * log(tau)) * v / tau
it <- it + 1
}
samp <- c(x, y)
samp[seq_len(nsamp)]
}
n <- 1000
normdata2 <- data.frame(x = marsaglia(n))
plt <- ggplot(normdata2, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
and is the waiting time between the $i-1$st and the $i$th event of the Poisson counting process $N(t) \sim \text{Poi}(\lambda t)$.
## Binomial random number generation
genbin <- function(n, p) {
u <- runif(n)
x <- sum(u < p)
}
n <- 10; p <- 0.6
nsamp <- 1000
x <- numeric(nsamp)
for (i in seq_len(nsamp)) {
x[i] <- genbin(n, p)
}
bindata <- data.frame(x)
plt <- ggplot(bindata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)
# Negative binomial random number generation
gengeom <- function(p, nsamp=1) {
u <- runif(nsamp)
y <- log(u) / log(1 - p)
ceiling(y)
}
nsamp <- 1000
p <- 0.6
r <- 5
x <- numeric(nsamp)
for (i in seq_len(r)) {
x <- x + gengeom(p, nsamp)
}
negbindata <- data.frame(x)
plt <- ggplot(negbindata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)
# Poisson random number generation
genpoi <- function(lam, maxiter=1000) {
u_cum <- 1.0
k <- 0
while (u_cum > exp(-lam) && k < maxiter ) {
u <- runif(1)
u_cum <- u_cum * u
k <- k + 1
}
k - 1
}
lam <- 3 # Poisson rate
nsamp <- 1000
x <- numeric(nsamp)
for (i in seq_len(nsamp)) {
x[i] <- genpoi(lam)
}
poidata <- data.frame(x)
plt <- ggplot(poidata, aes(x=x)) + geom_histogram(binwidth=0.25)
print(plt)
Alternatively, for even $\nu$:
This is because $\chi^2(\nu) = \text{Gamma}(\nu/2, 2)$, where 2 is the scale parameter.
## chi-square random number generation
genchisq1 <- function(nsamp, nu) {
z <- matrix(rnorm(nsamp * nu), nrow=nsamp)
rowSums(z^2)
}
nu <- 6
n <- 1000
chisqdata1 <- data.frame(x = genchisq1(n, nu))
plt <- ggplot(chisqdata1, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
## chi-square random number generation 2
genchisq2 <- function(nsamp, nu) {
u <- matrix(runif(nsamp * nu / 2), nrow=nsamp)
-2 * log(apply(u, 1, prod) )
}
nu <- 6
n <- 1000
chisqdata2 <- data.frame(x = genchisq2(n, nu))
plt <- ggplot(chisqdata2, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
## Student's t random number generation
gent <- function(nsamp, nu) {
z <- rnorm(nsamp)
chisq <- genchisq1(nsamp, nu)
trv <- z / sqrt(chisq / nu)
}
nu <- 6
n <- 1000
tdata <- data.frame(x = gent(n, nu))
plt <- ggplot(tdata, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
# F random number generation
genF <- function(nsamp, nu1, nu2) {
chisq1 <- genchisq1(nsamp, nu1)
chisq2 <- genchisq1(nsamp, nu2)
Frv <- chisq1 / nu1 / chisq2 * nu2
}
nu1 <- 10; nu2 <- 6
n <- 1000
Fdata <- data.frame(x = genF(n, nu1, nu2))
plt <- ggplot(Fdata, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
Suppose we want to sample from a distribution with complicated density $f(x)$. It is not easy to use the above method since neither the cdf nor its inverse is analytically available.
John von Neumann, while working on the Mahanttan Project, suggested the following:
for all $x$, for some constant $c > 0$.
Sample a random variable $X$ distributed according to $g$.
Accept $X$ as a representative of $f$ if
Reject $X$ otherwise. Go to Line 2.
Definition (Body of a function). For a nonnegative, integrable function $f$, its body is defined and denoted by $$ B_f = \{(x, y): 0 \le y \le f(x) \}. $$
Theorem 1. Supose random variable $X$ has density $g$, $U\sim\text{unif}[0, 1]$ is independent of $X$, and there exists $c > 0$ such that $f(x) \le c g(x)$ for all $x$. Then, the random vector $(X, cg(X)U)$ is uniformly distributed over the body $B_{cg}$ of $cg$.
That is, $(X,Y)|\{(X, Y)\in B_f\} \sim \text{unif}(B_f)$. + This means the accepted sample $(X, Y)$ is drawn according to $\text{unif}(B_f)$.
Thus sample $X$ is drawn according to $f$!
The total acceptance ratio is
We want to show that the joint density of $(X, cg(X)U)$ is $$\frac{1}{|B_{cg}|}\mathbb{I}_{B_{cg}}(x, y).$$
Let $Y = cg(X)U$. Then $Y|\{X=x\} \stackrel{d}{=} cg(x) U \sim \text{unif}[0, cg(x)]$. That is, the conditional density of $Y$ given $X$ is $$ p_{Y|X}(y|x) = \frac{1}{cg(x)}\mathbb{I}_{[0, cg(x)]}(y). $$ By construction, the marginal density of $X$ is given by $p_X(x) = g(x)$. Therefore the joint density of $(X, Y)$ is \begin{align*} p_{XY}(x, y) &= p_X(x)p_{Y|X}(y|x) = \frac{1}{c}\mathbb{I}_{[0, cg(x)]}(y) \\ &= \begin{cases} 1/c, & \text{if } 0 \le y \le c g(x), \\ 0, & \text{otherwise}. \end{cases} \\ &= \frac{1}{c}\mathbb{I}_{B_{cg}}(x,y). \end{align*} Now since $$ 1 = \int_{-\infty}^{\infty}\int_0^{cg(x)}\frac{1}{c}dydx = \frac{1}{c}|B_{cg}|, $$ we have $\frac{1}{|B_{cg}|}\mathbb{I}_{B_{cg}}(x, y)$ as desired.
One way to sample from the unit disk is to sample $(U, V)$ from $\text{unif}[-1, 1]\times\text{unif}[-1, 1]$, and discard the sample if $U^2 + V^2 > 1$ and resample.
We have $X=(U, V)$.
Recall that the Gamma distribution with shape parameter $\alpha$ and scale parameter $\beta$ has density $$ f_{\Gamma}(x; \alpha, \beta) = \frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha-1}e^{-x/\beta} $$ for $x \ge 0$. If $X \sim \text{Gamma}(\alpha, \beta)$, then $cX \sim \text{Gamma}(\alpha, c\beta)$. Hence it suffices to sample from $\text{Gamma}(\alpha, 1)$. Furthermore, $X\sim \text{Gamma}(\alpha, 1)$ and $\alpha > 1$, then $X \stackrel{d}{=} Y + Z$ where $Y \sim \text{Gamma}(\lfloor \alpha \rfloor, 1)$, $Z \sim \text{Gamma}(\alpha - \lfloor \alpha \rfloor, 1)$ and independent of $Y$. The $Y$ can be generated by summing $\lfloor \alpha \rfloor$ independent $\text{Exp}(1)$ random variables. Therefore we only need to sample from $\text{Gamma}(\alpha, 1)$, $\alpha \in (0, 1)$.
If $0 < \alpha < 1$, we see that $$ x^{\alpha - 1}e^{-x} \le \begin{cases} x^{\alpha - 1}, & \text{if } 0 \le x \le 1, \\ e^{-x}, & \text{otherwise}. \end{cases} $$ Thus we choose $$ h(x) = \begin{cases} x^{\alpha - 1}/\Gamma(\alpha), & \text{if } 0 \le x \le 1, \\ e^{-x}/\Gamma(\alpha), & \text{otherwise}. \end{cases} $$ leading to $$ g(x) = \begin{cases} x^{\alpha - 1}/(1/\alpha + 1/e), & \text{if } 0 \le x \le 1, \\ e^{-x}/(1/\alpha + 1/e), & \text{otherwise}. \end{cases} $$ and $$ c = \frac{1}{\Gamma(\alpha)}\left(\frac{1}{\alpha} + \frac{1}{e}\right). $$ Density $g$ has cdf $$ G(x) = \begin{cases} x^{\alpha}/(1 + \alpha/e), & \text{if } 0 \le x \le 1, \\ \frac{1 + \alpha/e - \alpha e^{-x}}{1 + \alpha/e}, & \text{otherwise}. \end{cases} $$ whose inverse is $$ G^{-1}(u) = \begin{cases} [(1 + \alpha/e)u]^{1/\alpha}, & \text{if } 0 \le u \le 1/[1+\alpha/e], \\ -\log(1/\alpha + 1/e) - \log(1 - u), & 1/[1+\alpha/e] \le u < 1. \end{cases} $$
gengamma_ar <- function(nsamp, alph) {
# sample X from g
v <- runif(nsamp) # unif rv for inverse method
idx <- v > 1 / (1 + alph * exp(-1))
x <- numeric(nsamp)
x[idx] = -log(1 / alph + exp(-1)) - log(1 - v[idx])
x[!idx] = ((1 + alph * exp(-1)) * v[!idx])^(1 / alph)
# test acceptance
u <- runif(nsamp)
idx2 <- (x > 1)
accept <- logical(nsamp)
accept[idx2] <- (u[idx2] < x[idx2]^(alph - 1))
accept[!idx2] <- (u[!idx2] < exp(-x[!idx2]))
x[accept]
}
n <- 2000
alph <- 0.5
x <- gengamma_ar(n, alph)
length(x)
length(x) / n
gamma(0.5) / (1 / alph + exp(-1) ) # acceptance ratio
gamdata <- data.frame(x = x)
plt <- ggplot(gamdata, aes(x=x)) +
geom_histogram(binwidth=0.25, fill="white", color="black") +
geom_density(aes(y=0.25 * ..count..), color="purple")
print(plt)
Suppose we want to sample $(X_1, \dotsc, X_k) \sim \text{mult}(n, p_1, \dotsc, p_k)$, where $\sum_{i=1}^k p_i = 1$, $\sum_{i=1}^n X_i = n$.
where $U \sim \text{unif}[0, 1]$. Take $p_0 = 0$.
Suppose we want to sample $\mathbf{X} = (X_1, \dotsc, X_p)$ from MVN $N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$. If we can find $\mathbf{L} \in \mathbb{R}^{p \times p}$ such that $\mathbf{L}^T\mathbf{L} = \boldsymbol{\Sigma}$, then $$ \mathbf{L}^T\mathbf{Z} + \boldsymbol{\mu}, \quad \mathbf{Z} \sim N(0, \mathbf{I}_p) $$ follows the desired distribution.
Random vector $\mathbf{Z}$ consists of $p$ independent standard normal random numbers.
Possible choices of $\mathbf{L}$ are:
such a matrix is symmetric and positive semidefinite, and is often denoted by $\boldsymbol{\Sigma}^{1/2}$.
A multivariate $t$ distribution with degrees of freedom $\nu$, scale matrix $\boldsymbol{\Sigma}$, and location vector $\boldsymbol{\mu}$ is the distribution of the random vector $$ \mathbf{T} = \frac{1}{\sqrt{\chi^2_{\nu}/\nu}}\mathbf{X} + \boldsymbol{\mu}, $$ where $\mathbf{X} \sim N(0, \boldsymbol{\Sigma})$, and $\chi^2_{\nu}$ is the chi-square random variable with $\nu$ degrees of freedom, independent of $\mathbf{X}$.
In many cases we can sample a random vector by sampling each component in turn and conditioning: \begin{align*} p_{X_1, \dotsc, X_p}(x_1, \dotsc, x_p) &= p_{X_1}(x_1)\prod_{j=2}^p p_{X_j|X_1, \dotsc, X_{j-1}}(x_j | x_1, \dotsc, x_{j-1}) \\ &= p_{X_1}(x_1)p_{X_2|X_1}(x_2|x_1) \prod_{j=3}^p p_{X_j|X_2, \dotsc, X_{j-1}}(x_j | x_1, \dotsc, x_{j-1}) \\ &= \dotsb \end{align*}
For $(X_1, \dotsc, X_k) \sim \text{mult}(n, p_1, \dotsc, p_k)$, it is immediate to see that $X_1 \sim B(n, p_1)$. Given $X_1 = x_1$, $(X_2, \dotsc, X_k) \sim \text{mult}(n - x_1, p_2 / (1 - p_1), \dotsc, p_k / (1 - p_1) )$. Hence $X_2 | \{X_1 = x_1\} \sim B(n - x_1, p_2 / (1 - p_1) )$ and so on.
If we want to sample $\mathbf{X} = (X_1, \dotsc, X_p)$ from MVN with mean $\boldsymbol{\mu}=(\mu_1, \dotsc, \mu_p)^T$ and covariance matrix $\boldsymbol{\Sigma} = (\sigma_{ij})$, then note that the first component $X_1 \sim N(\mu_1, \sigma_{11})$. From the conditional distribution formula for multivariate normal, we see that $$ (X_2, \dotsc, X_p) | \{X_1 = x_1\} \sim N(\bar{\boldsymbol{\mu}}, \bar{\boldsymbol{\Sigma}}), \quad \bar{\boldsymbol{\mu}} = \boldsymbol{\mu}_2 + \boldsymbol{\Sigma}_{12}^T(x_1 - \mu_1)/\sigma_{11}, ~ \bar{\boldsymbol{\Sigma}} = \boldsymbol{\Sigma}_{22} - \boldsymbol{\Sigma}_{12}^T\boldsymbol{\Sigma}_{12}/\sigma_{11} $$ if we partition \begin{align*} \boldsymbol{\mu} &= (\mu_1, \boldsymbol{\mu}_2)^T \\ \boldsymbol{\Sigma} &= \begin{bmatrix} \sigma_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{12}^T & \boldsymbol{\Sigma}_{22} \end{bmatrix} . \end{align*}