versioninfo()
Julia Version 1.6.2 Commit 1b93d53fc4 (2021-07-14 15:36 UTC) Platform Info: OS: macOS (x86_64-apple-darwin18.7.0) CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
using Pkg
Pkg.activate("../..")
Pkg.status()
Activating environment at `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`
Status `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml` [7d9fca2a] Arpack v0.5.3 [6e4b80f9] BenchmarkTools v1.1.4 [1e616198] COSMO v0.8.1 [f65535da] Convex v0.14.13 [a93c6f00] DataFrames v1.2.2 [31c24e10] Distributions v0.24.18 [e2685f51] ECOS v0.12.3 [f6369f11] ForwardDiff v0.10.19 [28b8d3ca] GR v0.58.1 [c91e804a] Gadfly v1.3.3 [bd48cda9] GraphRecipes v0.5.7 [82e4d734] ImageIO v0.5.8 [6218d12a] ImageMagick v1.2.1 [916415d5] Images v0.24.1 [b6b21f68] Ipopt v0.7.0 [42fd0dbc] IterativeSolvers v0.9.1 [4076af6c] JuMP v0.21.9 [b51810bb] MatrixDepot v1.0.4 [1ec41992] MosekTools v0.9.4 [76087f3c] NLopt v0.6.3 [47be7bcc] ORCA v0.5.0 [a03496cd] PlotlyBase v0.4.3 [f0f68f2c] PlotlyJS v0.15.0 [91a5bcdd] Plots v1.21.2 [438e738f] PyCall v1.92.3 [d330b81b] PyPlot v2.9.0 [dca85d43] QuartzImageIO v0.7.3 [6f49c342] RCall v0.13.12 [ce6b1742] RDatasets v0.7.5 [c946c3f1] SCS v0.7.1 [276daf66] SpecialFunctions v1.6.1 [2913bbd2] StatsBase v0.33.10 [b8865327] UnicodePlots v2.0.1 [0f1e0344] WebIO v0.8.15 [8f399da3] Libdl [2f01184e] SparseArrays [10745b16] Statistics
Algorithm is loosely defined as a set of instructions for doing something, which terminates in finite time. An algorithm requires input and output.
Donald Knuth: (1) finiteness, (2) definiteness, (3) input, (4) output, (5) effectiveness.
A flop (floating point operation) consists of a floating point addition, subtraction, multiplication, division, or comparison, and the usually accompanying fetch and store.
Some books count multiplication followed by an addition (fused multiply-add, FMA) as one flop. This results a factor of up to 2 difference in flop counts.
How to measure efficiency of an algorithm? Big O notation. If $n$ is the size of a problem, an algorithm has order $O(f(n))$, where the leading term in the number of flops is $c \cdot f(n)$. For example,
A * b
, where A
is $m \times n$ and b
is $n \times 1$, takes $2mn$ or $O(mn)$ flopsA * B
, where A
is $m \times n$ and B
is $n \times p$, takes $2mnp$ or $O(mnp)$ flopsA hierarchy of computational complexity:
Let $n$ be the problem size.
Classification of data sets by Huber (1994).
Data Size | Bytes | Storage Mode |
---|---|---|
Tiny | $10^2$ | Piece of paper |
Small | $10^4$ | A few pieces of paper |
Medium | $10^6$ (megabytes) | A floppy disk |
Large | $10^8$ | Hard disk |
Huge | $10^9$ (gigabytes) | Hard disk(s) |
Massive | $10^{12}$ (terabytes) | RAID storage |
$O(n^2)$ algorithm takes about $10^{24}/10^{12} = 10^{12}$ seconds, which is approximately 31710 years!
QuickSort and Fast Fourier Transform (invented by John Tukey) are celebrated algorithms that turn $O(n^2)$ operations into $O(n \log n)$. Another example is the Strassen's method for matrix multiplication, which turns $O(n^3)$ matrix multiplication into $O(n^{\log_2 7})$.
One goal of this course is to get familiar with the flop counts for some common numerical tasks in statistics.
The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.
-- James Gentle, Matrix Algebra, Springer, New York (2007).
A * B * x
and A * (B * x)
where A
and B
are matrices and x
is a vector.using BenchmarkTools, Random
Random.seed!(123) # seed
n = 1000
A = randn(n, n)
B = randn(n, n)
x = randn(n)
# complexity is n^3 + n^2 = O(n^3)
@benchmark $A * $B * $x
BenchmarkTools.Trial: 301 samples with 1 evaluation. Range (min … max): 12.038 ms … 27.045 ms ┊ GC (min … max): 0.00% … 15.52% Time (median): 15.712 ms ┊ GC (median): 0.00% Time (mean ± σ): 16.599 ms ± 3.089 ms ┊ GC (mean ± σ): 3.23% ± 6.41% ▃ ▂ ▂ ▂█▃▁ ▄ ▄▂ ▁ ▄▃▄█▆▆█▅██████▇█▆██▆▅▅▆▆▄▃▃▄▇▃▅▆▆██▅▆▅▄▅▁▃▁▃▃▅▃▁▁▆▃▃▃▃▃▄▁▃▃ ▄ 12 ms Histogram: frequency by time 24.7 ms < Memory estimate: 7.64 MiB, allocs estimate: 3.
# complexity is n^2 + n^2 = O(n^2)
@benchmark $A * ($B * $x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 275.552 μs … 4.424 ms ┊ GC (min … max): 0.00% … 92.43% Time (median): 288.479 μs ┊ GC (median): 0.00% Time (mean ± σ): 318.128 μs ± 73.509 μs ┊ GC (mean ± σ): 0.13% ± 0.92% ▆█▆▄▄▃▂▂▂▂▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ██████████████████████████████████████████▇▇▇▆▆▆▆▅▆▆▅▅▅▅▅▆▅▅ █ 276 μs Histogram: log(frequency) by time 527 μs < Memory estimate: 15.88 KiB, allocs estimate: 2.
Why are there the difference?
FLOPS.
For example, my laptop has the Intel i5-8279U (Coffee Lake) CPU with 4 cores runing at 2.4 GHz (cycles per second).
versioninfo()
Julia Version 1.6.2 Commit 1b93d53fc4 (2021-07-14 15:36 UTC) Platform Info: OS: macOS (x86_64-apple-darwin18.7.0) CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Intel Coffee Lake CPUs can do 16 double-precision flops per cycle and 32 single-precision flops per cycle. Then the theoretical throughput of my laptop is $$ 4 \times 2.4 \times 10^9 \times 16 = 153.6 \text{ GFLOPS} $$ in double precision and $$ 4 \times 2.4 \times 10^9 \times 32 = 307.2 \text{ GFLOPS} $$ in single precision.
LinearAlgebra.peakflops()
computes the peak flop rate of the computer by using gemm!
(double precision matrix-matrix multiplication).using LinearAlgebra
LinearAlgebra.peakflops(2^14) # matrix size 2^14
1.4744965532793915e11
which is about 147.4 GFLOPS DP.
For given data $x \in \mathcal{X}$, the true solution of the problem $f$ is $y = f(x) \in \mathcal{Y}$.
- The problem of solving $Ax=b$ for fixed $b$ is $f: A \mapsto A^{-1}b$ with $\mathcal{X}=\{M\in\mathbb{R}^{n\times n}: M \text{ is invertible} \}$ and $\mathcal{Y} = \mathbb{R}^n$.
For given data $x \in \mathcal{X}$, the solution computed by algorithm $\tilde{f}$ is $\hat{y} = \tilde{f}(x) \in \mathcal{Y}$.
- Example 1: solve $Ax=b$ by GEPP followed by forward and backward substitutions on a digital computer.
- Example 2: solve $Ax=b$ by Gauss-Seidel (an iterative method to come) on a digital computer.
- In both cases, the solutions (in $\mathcal{Y}$) are not the same as $A^{-1}b$!
+ We'll learn about these algorithms soon.
- Algorithms will be affected by at least rounding errors.
In words, a stable algorithm gives "nearly the right" answer to a "slightly wrong" question.
In words, a backward stable algorithm gives "exactly the right" answer to a "slightly wrong" question. - Backward stability implies stability, but not the other way around.
To see this, recall the definition of the condition number
$$
\kappa = \lim_{\delta\to 0}\sup_{\|\tilde{x} - x\|\le \delta \Vert x \Vert}\frac{\|f(\tilde{x}) - f(x)\|/\|f(x)\|}{\|\tilde{x} - x\|/\|x\|}
.
$$
Thus for $\tilde{x} \in \mathcal{X}$ such that $\frac{\Vert\tilde{x} - x\Vert}{\Vert x \Vert} = O(\epsilon)$ and $\tilde{f}(x) = f(\tilde{x})$ as $\epsilon \to 0$, we have
$$
\frac{\|\tilde{f}(x) - f(x)\|}{\|f(x)\|} \le ( \kappa + o(1) )\frac{\|\tilde{x} - x\|}{\|x\|}
= O(\kappa \epsilon)
.
$$
Examples
(Backward) Stability a property of an algorithm, whereas conditioning is a property of a problem.
What is Numerical Stability? by Nick Higham
This lecture note has evolved from Dr. Hua Zhou's 2019 Spring Statistical Computing course notes available at http://hua-zhou.github.io/teaching/biostatm280-2019spring/index.html.