sessionInfo()
R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] fansi_1.0.4 crayon_1.5.2 digest_0.6.31 utf8_1.2.2 [5] IRdisplay_1.1 repr_1.1.5 lifecycle_1.0.3 jsonlite_1.8.4 [9] evaluate_0.20 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 [13] uuid_1.1-0 vctrs_0.5.2 IRkernel_1.3.2 tools_4.2.2 [17] glue_1.6.2 fastmap_1.1.0 compiler_4.2.2 base64enc_0.1-3 [21] pbdZMQ_0.3-9 htmltools_0.5.4
- If $f(\mathbf{x}) = \mathbf{A}\mathbf{x} - \mathbf{b}$, then we now know quite well how to solve this.
- However, in case $f$ is nonlinear in $\mathbf{x}$, closed-form solutions are out of reach -- no "direct" method
- Iterative, numerical methods come to rescue.
Examples:
$$ f(x) = g'(x) = \frac{1 + x^{-1} - \log x}{(1 + x)^2}. $$
Nonlinear equation and optimization are closed related (at least if $g$ is differentiable).
Assume $f$ is continuous on $[l, r]$ and $f(l)f(r) \le 0$.
Then by the Intermediate Value Theorem, there exists $x^{\star}$ such that $f(x^{\star})=0$.
Consider the midpoint $m = (l + r)/2$.
If $f(m) = 0$, we are done.
Otherwise, start over with $r \gets m$ if $f(l)f(m) < 0$, $l \gets m$ if $f(m)f(r) < 0$.
Algorithm
bisect <- function(f, l, r, tol=1e-6) {
iter <- 0
while ( r - l > tol ) {
m <- (l + r) / 2
if ( f(l) * f(m) < 0 ) {
r <- m
} else {
l <- m
}
iter <- iter + 1
}
ret <- list(val = (l + r) / 2, iter = iter)
}
F <- function(x) { pt(x, 5) - 0.95 } # Student's t distribution with 5 df, 95%ile
F(1.291)
F(2.582)
ret <- bisect(F, 1.291, 2.582)
ret$val
ret$iter
qt(0.95, 5)
$x \in \mathbb{R}$ is a fixed point of function $F: \mathbb{R} \to \mathbb{R}$ if $F(x) = x$.
Fixed-point iteration
- If $F: \mathbb{R}^n \to \mathbb{R}^n$ maps $\mathbf{x} \mapsto \mathbf{R}\mathbf{x} - \mathbf{c}$ for matrix $\mathbf{R}=\mathbf{M}^{-1}\mathbf{K}$ where $\mathbf{M} - \mathbf{K} = \mathbf{A}$ and $\mathbf{c}=\mathbf{M}^{-1}\mathbf{b}$, then FPI reduces to the iterative method for solving linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$.
Find a root of $f(x) = x^4 - \sin x$ on $(0, 1]$.
x <- 1.0
for (i in 1:10) {
x <- (sin(x))^0.25
print(x)
}
[1] 0.9577668 [1] 0.9509906 [1] 0.9498498 [1] 0.9496563 [1] 0.9496234 [1] 0.9496178 [1] 0.9496169 [1] 0.9496167 [1] 0.9496167 [1] 0.9496167
x <- 1.0
for (i in 1:10) {
x <- sin(x) / x^3
print(x)
}
[1] 0.841471 [1] 1.251418 [1] 0.4844576 [1] 4.096051 [1] -0.01187393 [1] 7092.524 [1] -2.604367e-12 [1] 1.474333e+23 [1] 1.308064e-70 [1] 5.844432e+139
Theorem. Suppose the function $F$ defined on a closed interval $I \subset \mathbb{R}$ satisfies
Then, if $L \in [0, 1)$, $F$ has a unique fixed point $x^{\star} \in I$. Furthermore, fixed-point iteration $x^{k+1} = F(x^{k})$ converges to $x^{\star}$ regardless of the initial point $x^{(0)}$.
Proof. From condition 2, \begin{align*} |x^{(k+1)} - x^{(k)}| &= |F(x^{(k)} - F(x^{(k-1)})|| \\ &\le L|x^{(k)} - x^{(k-1)}| \\ &\vdots \\ &\le L^k|x^{(1)} - x^{(0)}| \end{align*} Using the triangular inequality, for $m > n$ we have
Thus the sequence $\{x^{(k)}\}$ is Cauchy and convergent. Since the interval $I$ is closed, $x^{\star} = \lim_k x^{(k)} \in I$.
Now by (Lipschitz) continuity of $F$, and the definition $x^{k+1} = F(x^{(k)})$, we see that $x^{\star}$ is a fixed point of $F$.
If there exists another fixed point $y^{\star} \neq x^{\star}$, then $$ |x^{\star} - y^{\star}| = |F(x^{\star}) - F(y^{\star})| \le L|x^{\star} - y^{\star}|, $$ which is a contradiction since $L \in (0, 1)$.
Remark. By sending $m \to \infty$ in inequality (*), we have the bound $$ |x^{(n)} - x^{\star}| \le \frac{L^n}{1 - L}|x^{(1)} - x^{(0)}| $$ which provides the rate of convergence to $x^{\star}$. This geometric rate of convergence is called linear convergence.
Recall our example $F_1(x) = (\sin x)^{1/4}$ and $F_2(x) = \frac{\sin x}{x^3}$.
Lipschitz modulus of a differentiable function is $L = \sup_{x\in I} |F'(x)|$ (why?).
$F_1$ maps $I=[0.2, 1]$ to a smaller interval. $F_1'(x) = \frac{\cos x}{4\sin^{3/4}(x)}$ is decreasing on $I$ and $1 > F_1'(0.2) > F_1'(1) > 0$.
$F_2$ expands $I=[0.9, 1]$ to $[0.841, 1.075]$ and also $F_2'(x) = \frac{x\cos x - 3\sin x}{x^4}$ is increasing on $I$ with $F_2'(1) < -1$.
(Homework)
$f(x) = 0$: $$ x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{f'(x^{(k)})}. $$
Motivation: first-order Taylor expansion
$$ 0 = f(x^{\star}) = f(x^{(k)}) + f'(\tilde{x})(x^{\star} - x^{(k)}) $$ for some $\tilde{x}$ on the line segment connecting $x^{(k)}$ and $x^{\star}$.
$$ 0 = f(x^{(k)}) + f'(x^{(k)})(x^{(k+1)} - x^{(k)}), $$ yielding the above iteration.
Fixed-point iteration: Newton's method is a FPI with
(assuming $f$ is twice differentiable).
If $f'(x^{\star})\neq 0$ and $|f''(x^\star)| < \infty$, then $F'(x^{\star})=0$. So there is a neighborhood of $x^{\star}$ in which $|F'(x)| < 1$.
With this assumption, apply 2nd-order Taylor expansion of $F$ around $x^{\star}$:
for $\tilde{x}$ on the line segment connecting $x^{(k-1)}$ and $x^{\star}$.
Thus Newton's method has a locally quadratic convergence property (under suitable contidions).
F <- function(x) { pt(x, 5) - 0.95 } # Student's t distribution with 5 df, 95%ile
Fprime <- function(x) { dt(x, 5) } # derivative of F (density function)
#x <- 1.291
x <- 2.582
options(digits=20)
for (i in 1:10) {
x <- x - (F(x) / Fprime(x))
print(x)
}
[1] 1.7338465618414033997 [1] 1.968962641494075072 [1] 2.0136622631340750367 [1] 2.0150470922847292243 [1] 2.0150483733319273227 [1] 2.015048373333023779 [1] 2.015048373333023779 [1] 2.015048373333023779 [1] 2.015048373333023779 [1] 2.015048373333023779
qt(0.95, 5)
Goal: to solve \begin{align*} f_1(x_1, \dotsc, x_n) &= 0 \\ \vdots & ~ \\ f_n(x_1, \dotsc, x_n) &=0. \end{align*}
Same idea as univariate version: linearize $f$ around $\mathbf{x}=(x_1, \dotsc, x_n)$.
Iteration:
where $$ Jf(\mathbf{x}) = \begin{bmatrix} \frac{\partial f_1(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_1(\mathbf{x})}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n(\mathbf{x})}{\partial x_1} & \dotsc & \frac{\partial f_n(\mathbf{x})}{\partial x_n} \end{bmatrix} $$
to get $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \Delta\mathbf{x}$, using LU etc, for each iteration.
Modified Newton: replace $Jf(\mathbf{x}^{(k)})$ with $J = Jf(\mathbf{x}^{(0)})$.
Jacobi's method: replace $Jf(\mathbf{x}^{(k)})$ with $\text{diag}(Jf(\mathbf{x}^{(k)}))$.
Gauss-Seidel: replace $Jf(\mathbf{x}^{(k)})$ with $\text{lowertri}(Jf(\mathbf{x}^{(k)}))$.
SOR: replace $Jf(\mathbf{x}^{(k)})$ with $\frac{1}{\omega}\text{diag}(Jf(\mathbf{x}^{(k)})) + \text{strictlowertri}(Jf(\mathbf{x}^{(k)}))$.