versioninfo()
Julia Version 1.9.3 Commit bed2cd540a1 (2023-08-24 14:43 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: macOS (x86_64-apple-darwin22.4.0) CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.6 (ORCJIT, skylake) Threads: 2 on 8 virtual cores
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
Activating project at `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/15-juliaopt`
Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/15-juliaopt/Project.toml` [1e616198] COSMO v0.8.8 [f65535da] Convex v0.15.3 [31c24e10] Distributions v0.25.103 [5789e2e9] FileIO v1.16.1 [f6369f11] ForwardDiff v0.10.36 [b99e6be6] Hypatia v0.7.3 [82e4d734] ImageIO v0.6.7 [916415d5] Images v0.26.0 [b6b21f68] Ipopt v1.5.1 [4076af6c] JuMP v1.16.0 [b8f27783] MathOptInterface v1.22.0 [6405355b] Mosek v10.1.3 [1ec41992] MosekTools v0.15.1 [76087f3c] NLopt v1.0.0 [91a5bcdd] Plots v1.39.0 [c946c3f1] SCS v1.3.1 [276daf66] SpecialFunctions v2.3.1
{width=400}
Getting familiar with good optimization softwares broadens the scope and scale of problems we are able to solve in statistics. Following table lists some of the best optimization softwares.
LP | MILP | SOCP | MISOCP | SDP | GP | NLP | MINLP | R | Matlab | Julia | Python | Cost | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
modeling tools | ||||||||||||||||
AMPL | x | x | x | x | x | x | x | x | x | x | x | $ | ||||
cvx | x | x | x | x | x | x | x | x | A | |||||||
Convex.jl | x | x | x | x | x | x | O | |||||||||
JuMP.jl | x | x | x | x | x | x | x | O | ||||||||
MathProgBase.jl | x | x | x | x | x | x | x | O | ||||||||
MathOptInterface.jl | x | x | x | x | x | x | x | O | ||||||||
convex solvers | ||||||||||||||||
Mosek | x | x | x | x | x | x | x | x | x | x | x | A | ||||
Gurobi | x | x | x | x | x | x | x | x | A | |||||||
CPLEX | x | x | x | x | x | x | x | x | A | |||||||
SCS | x | x | x | x | x | x | O | |||||||||
NLP solvers | ||||||||||||||||
NLopt | x | x | x | x | x | O | ||||||||||
Ipopt | x | x | x | x | x | O | ||||||||||
KNITRO | x | x | x | x | x | x | x | x | $ |
Mosek, Gurobi, Cplex, SCS, etc are concrete software implementation of optimization algorithms.
Users need to implement the problem to solve using their application programming interface (API). Example
AMPL (comercial, https://ampl.com) is an algebraic modeling language that allows describe the optimization problem most close to its mathematical formulation. Sample model
cvx (for Matlab) and Convex.jl (Julia) implement the disciplined convex programming (DCP) paradigm proposed by Grant and Boyd (2008). DCP prescribes a set of simple rules from which users can construct convex optimization problems easily.
Convex.jl
interfaces with actural problem solvers via MathOptInterface, an abstraction layer for mathematical optimization solvers in Julia.
MathOptInterface
.Modeling tools usually have the capability to use a variety of solvers. But modeling tools are solver agnostic so users do not have to worry about specific solver interface.
Standard convex problem classes like LP (linear programming), QP (quadratic programming), SOCP (second-order cone programming), SDP (semidefinite programming), and GP (geometric programming), are becoming a technology.
{width=300}
{width=300}
We illustrate optimization tools in Julia using microbiome analysis as an example.
Diversity of gut microbiome is believed to be an important factor for human health or diseases such as obesity; respiratory microbiome affects pulmonary function in an HIV-infected population.
16S microbiome sequencing techonology generates sequence counts of various organisms (operational taxonomic units, or OTUs) in samples.
The major interest is the composition of the microbiome in the population, hence for statistical analysis, OTU counts are normalized into proportions for each sample.
For details, see
Lin, W., Shi, P., Feng, R. and Li, H., 2014. Variable selection in regression with compositional covariates. Biometrika, 101(4), pp.785-797. https://doi.org/10.1093/biomet/asu031
The sum-to-zero contrained least squares is a standard quadratic programming (QP) problem so should be solved easily by any QP solver.
For simplicity we ignore intercept and non-OTU covariates in this presentation.
Let's first generate an artifical dataset:
using Random, LinearAlgebra, SparseArrays
Random.seed!(123) # seed
n, p = 100, 50
X = rand(n, p)
lmul!(Diagonal(1 ./ vec(sum(X, dims=2))), X)
β = sprandn(p, 0.1) # sparse vector with about 10% non-zero entries
y = X * β + randn(n);
using Convex
β̂cls = Variable(size(X, 2))
problem = minimize(0.5sumsquares(y - X * β̂cls)) # objective
problem.constraints += sum(β̂cls) == 0; # sum-to-zero constraint
using MosekTools, MathOptInterface
const MOI = MathOptInterface
solver = Mosek.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 1)
@time solve!(problem, solver)
Problem Name : Objective sense : minimize Type : CONIC (conic optimization problem) Constraints : 3 Affine conic cons. : 2 (104 rows) Disjunctive cons. : 0 Cones : 0 Scalar variables : 53 Matrix variables : 0 Integer variables : 0 Optimizer started. Presolve started. Linear dependency checker started. Linear dependency checker terminated. Eliminator started. Freed constraints in eliminator : 0 Eliminator terminated. Eliminator - tries : 1 time : 0.00 Lin. dep. - tries : 1 time : 0.00 Lin. dep. - primal attempts : 1 successes : 1 Lin. dep. - dual attempts : 0 successes : 0 Lin. dep. - primal deps. : 0 dual deps. : 0 Presolve terminated. Time: 0.00 Optimizer - threads : 8 Optimizer - solved problem : the dual Optimizer - Constraints : 52 Optimizer - Cones : 3 Optimizer - Scalar variables : 106 conic : 106 Optimizer - Semi-definite variables: 0 scalarized : 0 Factor - setup time : 0.00 Factor - dense det. time : 0.00 GP order time : 0.00 Factor - nonzeros before factor : 1328 after factor : 1378 Factor - dense dim. : 0 flops : 3.11e+05 ITE PFEAS DFEAS GFEAS PRSTATUS POBJ DOBJ MU TIME 0 2.5e+00 5.0e-01 2.0e+00 0.00e+00 -0.000000000e+00 -1.000000000e+00 1.0e+00 0.00 1 2.5e-01 5.0e-02 5.6e-01 -9.04e-01 2.761646440e+00 8.939118846e+00 9.9e-02 0.01 2 4.1e-02 8.2e-03 1.2e-01 -6.30e-01 1.477236633e+01 2.736777887e+01 1.6e-02 0.01 3 2.0e-03 3.9e-04 6.4e-04 8.73e-01 2.173581519e+01 2.186559731e+01 7.9e-04 0.01 4 1.2e-06 2.3e-07 1.1e-08 1.00e+00 2.178246695e+01 2.178258168e+01 4.7e-07 0.01 5 1.2e-10 2.5e-11 1.6e-14 1.00e+00 2.178268480e+01 2.178268481e+01 4.8e-11 0.01 Optimizer terminated. Time: 0.01 0.091715 seconds (6.57 k allocations: 1.959 MiB)
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 21.78268479990576, [8.821527219381636; 33.70209377680733; … ; -22.361993206260447; 12.468500909230388;;])
After installing Gurobi, set up the environmental variables. Set up the environmental variables. On my machine, I put following two lines in the ~/.julia/config/startup.jl
file:
ENV["GUROBI_HOME"] = "/Library/gurobi1003/macos_universal2/"
ENV["GRB_LICENSE_FILE"] = "/Users/jhwon/gurobi/gurobi.lic"
Switch to Gurobi solver:
using Gurobi
solver = Gurobi.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("OutputFlag"), 1)
@time solve!(problem, solver)
Set parameter Username Academic license - for non-commercial use only - expires 2024-11-14 Gurobi Optimizer version 10.0.3 build v10.0.3rc0 (mac64[x86]) CPU model: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz Thread count: 4 physical cores, 8 logical processors, using up to 8 threads Optimize a model with 107 rows, 157 columns and 5160 nonzeros Model fingerprint: 0x2f6aa477 Model has 2 quadratic constraints Coefficient statistics: Matrix range [1e-05, 2e+00] QMatrix range [1e+00, 1e+00] Objective range [1e+00, 1e+00] Bounds range [0e+00, 0e+00] RHS range [1e-02, 2e+00] Presolve removed 2 rows and 1 columns Presolve time: 0.00s Presolved: 105 rows, 156 columns, 5158 nonzeros Presolved model has 2 second-order cone constraints Ordering time: 0.00s Barrier statistics: Free vars : 50 AA' NZ : 5.154e+03 Factor NZ : 5.262e+03 Factor Ops : 3.590e+05 (less than 1 second per iteration) Threads : 1 Objective Residual Iter Primal Dual Primal Dual Compl Time 0 1.01905077e+01 -5.01000000e-01 2.07e+01 1.00e-01 1.81e-01 0s 1 2.85279977e+00 -2.62345360e-01 3.98e+00 2.92e-05 3.52e-02 0s 2 2.65921342e+00 4.72089921e+00 3.43e+00 3.59e-05 1.39e-02 0s 3 9.46149510e+00 1.32723341e+01 1.36e+00 4.73e-06 1.54e-02 0s 4 2.27842360e+01 7.96792805e+00 1.03e-02 7.55e-07 1.40e-01 0s 5 2.29865080e+01 2.06571541e+01 1.13e-08 4.94e-07 2.20e-02 0s 6 2.19990895e+01 2.17086651e+01 1.55e-10 1.92e-11 2.74e-03 0s 7 2.17829284e+01 2.17825422e+01 4.19e-11 5.93e-09 3.63e-06 0s 8 2.17826852e+01 2.17826847e+01 1.15e-09 1.12e-10 5.20e-09 0s Barrier solved model in 8 iterations and 0.01 seconds (0.01 work units) Optimal objective 2.17826852e+01 User-callback calls 72, time in user-callback 0.00 sec 0.011574 seconds (7.64 k allocations: 2.404 MiB)
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 21.7826851961337, [8.821572026387795; 33.70227455280494; … ; -22.362109412726067; 12.468559726470138;;])
# Use SCS solver
using SCS
solver = SCS.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), 1)
@time solve!(problem, solver)
------------------------------------------------------------------ SCS v3.2.3 - Splitting Conic Solver (c) Brendan O'Donoghue, Stanford University, 2012 ------------------------------------------------------------------ problem: variables n: 53, constraints m: 107 cones: z: primal zero / dual free vars: 2 l: linear vars: 1 q: soc vars: 104, qsize: 2 settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07 alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1 max_iters: 100000, normalize: 1, rho_x: 1.00e-06 acceleration_lookback: 10, acceleration_interval: 10 lin-sys: sparse-direct-amd-qdldl nnz(A): 5056, nnz(P): 0 ------------------------------------------------------------------ iter | pri res | dua res | gap | obj | scale | time (s) ------------------------------------------------------------------ 0| 1.10e+01 5.52e+00 4.15e+01 1.57e+01 1.00e-01 1.32e-03 150| 3.46e-04 3.18e-05 4.03e-05 2.18e+01 1.00e-01 3.49e-03 ------------------------------------------------------------------ status: solved timings: total: 3.49e-03s = setup: 1.26e-03s + solve: 2.23e-03s lin-sys: 1.97e-03s, cones: 4.26e-05s, accel: 4.61e-05s ------------------------------------------------------------------ objective = 21.783059 ------------------------------------------------------------------ 0.005533 seconds (5.04 k allocations: 1.438 MiB)
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 21.7830788756544, [8.82152723106786; 33.70209380959717; … ; -22.361993226157978; 12.468500924793684;;])
# Use COSMO solver
using COSMO
solver = COSMO.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 5000)
@time solve!(problem, solver)
------------------------------------------------------------------
COSMO v0.8.8 - A Quadratic Objective Conic Solver
Michael Garstka
University of Oxford, 2017 - 2022
------------------------------------------------------------------
Problem: x ∈ R^{53},
constraints: A ∈ R^{107x53} (5056 nnz),
matrix size to factor: 160x160,
Floating-point precision: Float64
Sets: SecondOrderCone of dim: 101
SecondOrderCone of dim: 3
ZeroSet of dim: 2
Nonnegatives of dim: 1
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
ρ = 0.1, σ = 1e-06, α = 1.6,
max_iter = 5000,
scaling iter = 10 (on),
check termination every 25 iter,
check infeasibility every 40 iter,
KKT system solver: QDLDL
Acc: Anderson Type2{QRDecomp},
Memory size = 15, RestartedMemory,
Safeguarded: true, tol: 2.0
Setup Time: 1.0ms
Iter: Objective: Primal Res: Dual Res: Rho:
1 -2.4722e+00 5.5123e+00 3.3167e+02 1.0000e-01
25 2.3312e+01 3.9932e-01 5.2608e+00 1.0000e-01
50 1.8682e+01 1.6042e-01 2.6250e-03 1.1189e-02
75 1.9742e+01 1.0033e-01 2.6586e-03 1.1189e-02
100 2.0476e+01 6.2798e-02 2.1036e-03 1.1189e-02
125 2.1783e+01 2.4121e-05 4.5987e-08 1.1189e-02
------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 180 (incl. 55 safeguarding iter)
Optimal objective: 21.78
Runtime: 0.005s (5.5ms)
0.008572 seconds (9.01 k allocations: 4.325 MiB)
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 21.783219128578512, [8.821527236654916; 33.70209380229668; … ; -22.36199323318564; 12.46850092930839;;])
# Use Hypatia solver
using Hypatia
solver = Hypatia.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), 1)
@time solve!(problem, solver)
iter p_obj d_obj | abs_gap x_feas z_feas | tau kap mu | dir_res prox step alpha 0 1.0000e+00 -2.0000e+00 | 1.05e+02 1.42e+00 6.10e-01 | 1.00e+00 1.00e+00 1.00e+00 | 1 5.2739e+00 6.7502e+00 | 1.57e+01 1.10e+00 4.73e-01 | 1.93e-01 8.86e-01 1.50e-01 | 1.6e-15 7.6e-01 co-a 8.50e-01 2 1.4674e+01 2.1404e+01 | 4.70e+00 8.72e-01 3.76e-01 | 7.31e-02 6.72e-01 4.48e-02 | 1.0e-14 7.7e-01 co-a 7.00e-01 3 2.6469e+01 3.3048e+01 | 1.87e+00 5.18e-01 2.23e-01 | 4.92e-02 3.96e-01 1.78e-02 | 4.7e-15 8.5e-01 co-a 6.00e-01 4 3.4927e+01 3.6432e+01 | 5.59e-01 1.62e-01 7.00e-02 | 4.71e-02 9.25e-02 5.32e-03 | 3.4e-15 6.9e-01 co-a 7.00e-01 5 3.1820e+01 3.2115e+01 | 2.21e-01 5.26e-02 2.26e-02 | 5.82e-02 2.58e-02 2.10e-03 | 4.0e-15 9.1e-01 co-a 6.00e-01 6 2.9466e+01 2.9625e+01 | 1.52e-01 2.88e-02 1.24e-02 | 7.43e-02 1.79e-02 1.45e-03 | 1.6e-13 8.7e-01 co-a 3.00e-01 7 2.8219e+01 2.8318e+01 | 1.05e-01 1.83e-02 7.90e-03 | 8.18e-02 1.23e-02 1.00e-03 | 5.3e-14 7.1e-01 co-a 3.00e-01 8 2.7285e+01 2.7333e+01 | 7.33e-02 1.12e-02 4.83e-03 | 9.35e-02 7.46e-03 6.98e-04 | 8.1e-14 4.5e-01 co-a 3.00e-01 9 2.6206e+01 2.6220e+01 | 3.64e-02 4.73e-03 2.04e-03 | 1.11e-01 3.11e-03 3.47e-04 | 5.4e-13 9.2e-01 co-a 5.00e-01 10 2.5648e+01 2.5653e+01 | 1.80e-02 2.09e-03 8.98e-04 | 1.26e-01 1.40e-03 1.72e-04 | 5.1e-13 8.7e-01 co-a 5.00e-01 11 2.5499e+01 2.5502e+01 | 1.26e-02 1.43e-03 6.16e-04 | 1.28e-01 9.41e-04 1.20e-04 | 1.4e-12 5.8e-01 co-a 3.00e-01 12 2.5293e+01 2.5294e+01 | 5.00e-03 5.44e-04 2.34e-04 | 1.35e-01 3.57e-04 4.77e-05 | 1.3e-12 6.5e-01 co-a 6.00e-01 13 2.5216e+01 2.5216e+01 | 2.00e-03 2.13e-04 9.19e-05 | 1.38e-01 1.39e-04 1.90e-05 | 2.5e-12 9.5e-01 co-a 6.00e-01 14 2.5191e+01 2.5191e+01 | 9.93e-04 1.06e-04 4.58e-05 | 1.38e-01 6.89e-05 9.45e-06 | 2.3e-12 7.2e-01 co-a 5.00e-01 15 2.5176e+01 2.5176e+01 | 3.95e-04 4.26e-05 1.83e-05 | 1.38e-01 2.75e-05 3.76e-06 | 1.0e-12 8.9e-01 co-a 6.00e-01 16 2.5170e+01 2.5170e+01 | 1.57e-04 1.71e-05 7.34e-06 | 1.38e-01 1.09e-05 1.50e-06 | 2.5e-12 7.5e-01 co-a 6.00e-01 17 2.5167e+01 2.5167e+01 | 6.26e-05 6.85e-06 2.95e-06 | 1.37e-01 4.37e-06 5.96e-07 | 2.6e-12 7.2e-01 co-a 6.00e-01 18 2.5166e+01 2.5166e+01 | 1.87e-05 2.06e-06 8.87e-07 | 1.37e-01 1.31e-06 1.78e-07 | 5.4e-12 6.6e-01 co-a 7.00e-01 19 2.5166e+01 2.5166e+01 | 5.59e-06 6.20e-07 2.67e-07 | 1.37e-01 3.93e-07 5.33e-08 | 9.9e-11 6.7e-01 co-a 7.00e-01 20 2.5166e+01 2.5166e+01 | 1.12e-06 1.24e-07 5.35e-08 | 1.36e-01 7.85e-08 1.06e-08 | 1.5e-10 7.6e-01 co-a 8.00e-01 21 2.5166e+01 2.5166e+01 | 2.22e-07 2.50e-08 1.08e-08 | 1.36e-01 1.57e-08 2.11e-09 | 6.5e-10 7.7e-01 co-a 8.00e-01 22 2.5166e+01 2.5166e+01 | 3.32e-08 3.74e-09 1.61e-09 | 1.36e-01 2.34e-09 3.16e-10 | 5.6e-10 7.2e-01 co-a 8.50e-01 optimal solution found; terminating status is Optimal after 22 iterations and 0.282 seconds 0.295836 seconds (61.44 k allocations: 7.147 MiB)
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
(MathOptInterface.OPTIMAL, 21.782684662699275, [8.821527227307735; 33.702093762570016; … ; -22.361993192180634; 12.468500936073355;;])
Suppose we want to know which organisms (OTUs) are associated with the response. We can answer this question using a zero-sum contrained lasso $$ \begin{array}{ll} \text{minimize} & \frac 12 \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|_2^2 + \lambda \|\boldsymbol{\beta}\|_1 \\ \text{subject to} & \mathbf{1}^T\boldsymbol{\beta} = 0. \end{array} $$
Varying $\lambda$ from small to large values will generate a solution path.
# solve at a grid of λ
λgrid = 0:0.01:0.35
β̂path = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
β̂classo = Variable(size(X, 2))
@time for i in 1:length(λgrid)
λ = λgrid[i]
# objective
problem = minimize(0.5sumsquares(y - X * β̂classo) + λ * sum(abs, β̂classo))
# constraint
problem.constraints += sum(β̂classo) == 0 # constraint
# solver = Hypatia.Optimizer() # try different solvers
# MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), 0) # keep silent
solver = COSMO.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("verbose"), false) # keep silent
solve!(problem, solver)
β̂path[i, :] = β̂classo.value
end
1.404733 seconds (1.71 M allocations: 310.095 MiB, 48.15% gc time, 0.96% compilation time)
using Plots; gr()
p = plot(collect(λgrid), β̂path, legend=:none)
xlabel!(p, "lambda")
ylabel!(p, "beta_hat")
title!(p, "Zero-sum Lasso")
[ Info: Precompiling IJuliaExt [2f4121a4-3b3a-5ce6-9c5e-1f2673ce168a]
Suppose we want to do variable selection not at the OTU level, but at the Phylum level. OTUs are clustered into various Phyla. We can answer this question using a sum-to-zero contrained group lasso $$ \begin{array}{ll} \text{minimize} & \frac 12 \|\mathbf{y} - \mathbf{X} \boldsymbol{\beta}\|_2^2 + \lambda \sum_j \|\boldsymbol{\beta}_j\|_2 \\ \text{subject to} & \mathbf{1}^T\boldsymbol{\beta} = 0, \end{array} $$ where $\boldsymbol{\beta}_j$ is the $j$th partition of the regression coefficients corresponding to the $j$th phylum. This is a second-order cone programming (SOCP) problem readily modeled by Convex.jl.
Let's assume each 10 contiguous OTUs belong to one Phylum.
# solve at a grid of λ
λgrid = 0.1:0.005:0.5
β̂pathgrp = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
β̂classo = Variable(size(X, 2))
@time for i in 1:length(λgrid)
λ = λgrid[i]
# loss
obj = 0.5sumsquares(y - X * β̂classo)
# group lasso penalty term
for j in 1:(size(X, 2)/10)
βj = β̂classo[(10(j-1)+1):10j]
obj = obj + λ * norm(βj)
end
problem = minimize(obj)
# constraint
problem.constraints += sum(β̂classo) == 0 # constraint
solver = Mosek.Optimizer() # MOSEK this time!
MOI.set(solver, MOI.RawOptimizerAttribute("LOG"), 0) # keep silent
solve!(problem, solver)
β̂pathgrp[i, :] = β̂classo.value
end
0.846394 seconds (679.84 k allocations: 170.177 MiB, 6.09% gc time)
It took Mosek <2 seconds to solve this seemingly hard optimization problem at 80 different $\lambda$ values.
p2 = plot(collect(λgrid), β̂pathgrp, legend=:none)
xlabel!(p2, "lambda")
ylabel!(p2, "beta_hat")
title!(p2, "Zero-Sum Group Lasso")
using ImageIO, FileIO # detect file formats and dispatch to appropriate readers/writers
lena = load("lena128missing.png")
# convert to real matrices
Y = Float64.(lena)
128×128 Matrix{Float64}: 0.0 0.0 0.635294 0.0 … 0.0 0.0 0.627451 0.627451 0.623529 0.0 0.611765 0.0 0.0 0.388235 0.611765 0.611765 0.0 0.0 0.403922 0.219608 0.0 0.0 0.0 0.611765 0.0 0.223529 0.176471 0.192157 0.611765 0.0 0.615686 0.615686 0.0 0.0 0.0 0.0 0.0 0.0 0.619608 … 0.0 0.0 0.2 0.607843 0.0 0.623529 0.0 0.176471 0.192157 0.0 0.0 0.0 0.623529 0.0 0.0 0.0 0.215686 0.619608 0.619608 0.0 0.0 0.2 0.0 0.207843 0.0 0.0 0.635294 0.635294 0.2 0.192157 0.188235 0.635294 0.0 0.0 0.0 … 0.192157 0.180392 0.0 0.631373 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.627451 0.635294 0.666667 0.172549 0.0 0.184314 ⋮ ⋱ ⋮ 0.0 0.129412 0.0 0.541176 0.0 0.286275 0.0 0.14902 0.129412 0.196078 0.537255 0.345098 0.0 0.0 0.215686 0.0 0.262745 0.0 0.301961 0.0 0.207843 0.345098 0.341176 0.356863 0.513725 0.0 0.0 0.231373 0.0 0.0 0.0 0.0 … 0.0 0.243137 0.258824 0.298039 0.415686 0.458824 0.0 0.0 0.0 0.258824 0.0 0.368627 0.4 0.0 0.0 0.0 0.235294 0.0 0.0 0.34902 0.0 0.0 0.239216 0.207843 0.219608 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.219608 0.235294 0.356863 … 0.0 0.0 0.0 0.196078 0.207843 0.211765 0.0 0.0 0.270588 0.345098 0.192157 0.0 0.196078 0.309804 0.266667 0.356863 0.0
We fill out the missing pixels using a matrix completion technique developed by Candes and Tao $$ \begin{array}{ll} \text{minimize} & \|\mathbf{X}\|_* \\ \text{subject to} & x_{ij} = y_{ij} \text{ for all observed entries } (i, j). \end{array} $$
Here $\|\mathbf{X}\|_* = \sum_{i=1}^{\min(m,n)} \sigma_i(\mathbf{X})$ is the nuclear norm. It can be shown that $$ \|\mathbf{X}\|_* = \sup_{\|\mathbf{Y}\|_2 \le 1} \langle \mathbf{X}, \mathbf{Y} \rangle, $$ where $\|\mathbf{Y}\|_2=\sigma_{\max}(\mathbf{Y})$ is the spectral (operator 2-) norm, and $\langle \mathbf{X}, \mathbf{Y} \rangle = \text{tr}(\mathbf{X}^T\mathbf{Y})$. That is, $\|\cdot\|_*$ is the dual norm of $\|\cdot\|_2$.
We want the matrix with the lowest rank that agrees with the observed entries, but instead seek one with the minimal nuclear norm as a convex relaxation.
This is a semidefinite programming (SDP) problem readily modeled by Convex.jl.
This example takes long because of high dimensionality.
# Use COSMO solver
using COSMO
solver = COSMO.Optimizer()
## Use SCS solver
# using SCS
# solver = SCS.Optimizer()
## Use Mosek solver
# using Mosek
# solver = Mosek.Optimizer()
## Use Hypatia solver
# using Hypatia
# solver = Hypatia.Optimizer()
# Linear indices of obs. entries
obsidx = findall(Y[:] .≠ 0.0)
# Create optimization variables
X = Convex.Variable(size(Y))
# Set up optmization problem
problem = minimize(nuclearnorm(X))
problem.constraints += X[obsidx] == Y[obsidx]
# Solve the problem by calling solve
@time solve!(problem, solver)
------------------------------------------------------------------
COSMO v0.8.8 - A Quadratic Objective Conic Solver
Michael Garstka
University of Oxford, 2017 - 2022
------------------------------------------------------------------
Problem: x ∈ R^{49153},
constraints: A ∈ R^{73665x49153} (73793 nnz),
matrix size to factor: 122818x122818,
Floating-point precision: Float64
Sets: ZeroSet of dim: 40769
DensePsdConeTriangle of dim: 32896 (256x256)
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
ρ = 0.1, σ = 1e-06, α = 1.6,
max_iter = 5000,
scaling iter = 10 (on),
check termination every 25 iter,
check infeasibility every 40 iter,
KKT system solver: QDLDL
Acc: Anderson Type2{QRDecomp},
Memory size = 15, RestartedMemory,
Safeguarded: true, tol: 2.0
Setup Time: 113.52ms
Iter: Objective: Primal Res: Dual Res: Rho:
1 -1.4426e+03 1.1678e+01 5.9856e-01 1.0000e-01
25 1.4495e+02 5.5360e-02 1.1033e-03 1.0000e-01
50 1.4754e+02 1.2369e-02 1.6744e-03 7.4179e-01
75 1.4797e+02 4.9490e-04 5.5696e-05 7.4179e-01
100 1.4797e+02 1.4115e-05 1.3438e-06 7.4179e-01
------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 100
Optimal objective: 148
Runtime: 2.193s (2193.48ms)
7.279706 seconds (6.41 M allocations: 966.528 MiB, 5.33% gc time, 73.62% compilation time: 14% of which was recompilation)
using Images
colorview(Gray, X.value)
We use MLE of Gamma distribution to illustrate some rudiments of nonlinear programming (NLP) in Julia.
Let $x_1,\ldots,x_m$ be a random sample from the gamma density with shape parameter $\alpha$ and rate parameter $\beta$:
on $(0,\infty)$. The log likelihood function is $$ \small L(\alpha, \beta) = m [- \ln \Gamma(\alpha) + \alpha \ln \beta + (\alpha - 1)\overline{\ln x} - \beta \bar x], $$ where $\overline{x} = \frac{1}{m} \sum_{i=1}^m x_i$ and $\overline{\ln x} = \frac{1}{m} \sum_{i=1}^m \ln x_i$.
using Random, Statistics, SpecialFunctions
Random.seed!(280)
function gamma_logpdf(x::Vector, α::Real, β::Real)
m = length(x)
avg = mean(x)
logavg = sum(log, x) / m
m * (- loggamma(α) + α * log(β) + (α - 1) * logavg - β * avg)
end
x = rand(5)
gamma_logpdf(x, 1.0, 1.0)
-2.718862752561273
Many optimization algorithms involve taking derivatives of the objective function. The ForwardDiff.jl
package implements automatic differentiation.
For example, to compute the derivative and Hessian of the log-likelihood with data x
at α=1.0
and β=1.0
.
using ForwardDiff
ForwardDiff.gradient(θ -> gamma_logpdf(x, θ...), [1.0; 1.0])
2-element Vector{Float64}: -2.147973668615522 2.281137247438727
ForwardDiff.hessian(θ -> gamma_logpdf(x, θ...), [1.0; 1.0])
2×2 Matrix{Float64}: -8.22467 5.0 5.0 -5.0
using Distributions, Random
Random.seed!(280)
(n, p) = (1000, 2)
(α, β) = 5.0 * rand(p)
x = rand(Gamma(α, β), n)
println("True parameter values:")
println("α = ", α, ", β = ", β)
True parameter values: α = 2.4692583262857157, β = 4.487606332976687
We use JuMP.jl to model and solve our NLP problem.
Recall that we want to solve the optimization problem: $$ \small \begin{array}{ll} \text{maximize} & L(\alpha, \beta)= m [- \ln \Gamma(\alpha) + \alpha \ln \beta + (\alpha - 1)\overline{\ln x} - \beta \bar x] \\ \text{subject to} & \alpha \ge 0 \\ ~ & \beta \ge 0 \end{array} $$
Observe the similarity and difference in modeling with Convex.jl:
using JuMP
using NLopt # https://nlopt.readthedocs.io/en/latest/
m = Model(NLopt.Optimizer)
set_optimizer_attribute(m, "algorithm", :LD_MMA)
# using Ipopt # https://github.com/coin-or/Ipopt
# m = Model(Ipopt.Optimizer)
# set_optimizer_attribute(m, "print_level", 3)
myf(a, b) = gamma_logpdf(x, a, b)
JuMP.register(m, :myf, 2, myf, autodiff=true)
@variable(m, α >= 1e-8)
@variable(m, β >= 1e-8)
@NLobjective(m, Max, myf(α, β))
print(m)
status = JuMP.optimize!(m)
println("MLE (JuMP):")
println("α = ", JuMP.value(α), ", β = ", JuMP.value(β))
println("Objective value: ", JuMP.objective_value(m))
MLE (JuMP): α = 2.482264846433373, β = 0.22245088505669489 Objective value: -3229.1399326327714
Then convert the rate parameter to the scale parameter to compare the result with fit_mle()
in the Distribution
package:
println("α = ", JuMP.value(α), ", θ = ", 1 / JuMP.value(β))
println("MLE (Distribution package):")
println(fit_mle(Gamma, x))
α = 2.477061702598836, θ = 4.5057945965951935 MLE (Distribution package): Gamma{Float64}(α=2.4773249047555272, θ=4.505157427525276)