versioninfo()
Julia Version 1.1.0 Commit 80516ca202 (2019-01-21 21:24 UTC) Platform Info: OS: macOS (x86_64-apple-darwin14.5.0) CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Environment: JULIA_EDITOR = code
Methods for solving linear regression $\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:
Method | Flops | Remarks | Software | Stability |
---|---|---|---|---|
Sweep | $np^2 + p^3$ | $(X^TX)^{-1}$ available | SAS | less stable |
Cholesky | $np^2 + p^3/3$ | less stable | ||
QR by Householder | $2np^2 - (2/3)p^3$ | R | stable | |
QR by MGS | $2np^2$ | $Q_1$ available | stable | |
QR by SVD | $4n^2p + 8np^2 + 9p^3$ | $X = UDV^T$ | most stable |
Remarks:
There is simply no such thing as a universal 'gold standard' when it comes to algorithms.
using SweepOperator, BenchmarkTools, LinearAlgebra
linreg_cholesky(y::Vector, X::Matrix) = cholesky!(X'X) \ (X'y)
linreg_qr(y::Vector, X::Matrix) = X \ y
function linreg_sweep(y::Vector, X::Matrix)
p = size(X, 2)
xy = [X y]
tableau = xy'xy
sweep!(tableau, 1:p)
return tableau[1:p, end]
end
function linreg_svd(y::Vector, X::Matrix)
xsvd = svd(X)
return xsvd.V * ((xsvd.U'y) ./ xsvd.S)
end
linreg_svd (generic function with 1 method)
using Random
Random.seed!(280) # seed
n, p = 10, 3
X = randn(n, p)
y = randn(n)
# check these methods give same answer
@show linreg_cholesky(y, X)
@show linreg_qr(y, X)
@show linreg_sweep(y, X)
@show linreg_svd(y, X);
linreg_cholesky(y, X) = [0.390365, 0.262759, 0.149047] linreg_qr(y, X) = [0.390365, 0.262759, 0.149047] linreg_sweep(y, X) = [0.390365, 0.262759, 0.149047] linreg_svd(y, X) = [0.390365, 0.262759, 0.149047]
n, p = 1000, 300
X = randn(n, p)
y = randn(n)
@benchmark linreg_cholesky(y, X)
BenchmarkTools.Trial: memory estimate: 708.31 KiB allocs estimate: 8 -------------- minimum time: 1.590 ms (0.00% GC) median time: 1.756 ms (0.00% GC) mean time: 1.821 ms (2.78% GC) maximum time: 39.979 ms (95.16% GC) -------------- samples: 2733 evals/sample: 1
@benchmark linreg_sweep(y, X)
BenchmarkTools.Trial: memory estimate: 2.99 MiB allocs estimate: 7 -------------- minimum time: 5.200 ms (0.00% GC) median time: 7.225 ms (0.00% GC) mean time: 7.300 ms (2.58% GC) maximum time: 48.544 ms (82.00% GC) -------------- samples: 684 evals/sample: 1
@benchmark linreg_qr(y, X)
BenchmarkTools.Trial: memory estimate: 4.04 MiB allocs estimate: 2444 -------------- minimum time: 10.474 ms (0.00% GC) median time: 11.760 ms (0.00% GC) mean time: 12.302 ms (3.93% GC) maximum time: 52.806 ms (76.80% GC) -------------- samples: 406 evals/sample: 1
@benchmark linreg_svd(y, X)
BenchmarkTools.Trial: memory estimate: 8.06 MiB allocs estimate: 16 -------------- minimum time: 31.354 ms (0.00% GC) median time: 32.280 ms (0.00% GC) mean time: 33.278 ms (3.04% GC) maximum time: 84.534 ms (59.10% GC) -------------- samples: 151 evals/sample: 1