Note: This package is a proof of concept. While the code will remain working with proper use of a Julia manifest, we can't guarantee ongoing support and maintenance.
Remove comment below to install the exact version of packages listed in the manifest.
# import Pkg; Pkg.instantiate()
The core package is DifferentiableStateSpaceModels.jl
where DifferenceEquations.jl
provides compatible functionality for simulating and calculating likelihoods
using DifferentiableStateSpaceModels, DifferenceEquations, LinearAlgebra, Zygote, Distributions, Plots, DiffEqBase, Symbolics, BenchmarkTools
┌ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] └ @ Base loading.jl:1423
The model follows Schmitt-Grohe and Uribe (2004) timing convention. The system takes a nonlinear expectational difference equation including all first-order conditions for decisions and the system evolution equations,
$$ \mathbb{E}_{t}\mathcal{H}\left(y',y,x',x;p\right)=0 $$where $y$ are the control variables, $x$ are the states, and $p$ is a vector of deep parameters of interest. Expectations are taken over forward-looking variables and an underlying random process $\epsilon'$.
In addition, we consider an observation equation - which might be noisy, for $$ z = Q \cdot \begin{bmatrix}y &x\end{bmatrix}^{\top} + \nu $$ where $\nu$ may or may not be normally distributed but $\mathbb{E}(\nu) = 0$ and $\mathbb{V}(\nu) = \Omega(p) \Omega(p)^{\top}$.
Assume that there is a non-stochastic steady state of this problem as $y_{ss}, x_{ss}$.
Define the deviation from the non-stochastic steady state as $\hat{x} \equiv x - x_{ss}, \hat{y} \equiv y - y_{ss},$ and $\hat{z} \equiv z - z_{ss}$.
The solution finds the first or second order perturbation around that non-stochastic steady state, and yields
$$ x' = h(x; p) + \eta \ \Gamma(p)\ \epsilon' $$where $\eta$ describes how shocks affect the law of motion and $\mathbb{E}(\epsilon') = 0$. Frequently this would be organized such that $\mathbb{V}(\epsilon)= I$, but that is not required. In addition, it could instead be interpreted as for $x' = h(x; p) + \eta \ \epsilon'$ with $\mathbb{V}(\epsilon') = \Gamma(p) \Gamma(p)^{\top}$.
and with the policy equation,
$$ y = g(x; p) $$and finally, substitution in for the observation equation
$$ z= Q \begin{bmatrix} g(x;p) \\ x\end{bmatrix} + \nu $$Perturbation approximates the above model equations, where $h$ and $g$ are not available explicitly, by a Taylor expansion around the steady state. For example, in the case of the 1st order model the solution finds
$$ \hat{x}' = A(p)\ \hat{x} + B(p) \epsilon' $$and
$$ \hat{y} = g_x(p) \ \hat{x} $$and
$$ \hat{z} = C(p)\ \hat{x} + \nu $$where $C(p)\equiv Q \begin{bmatrix} g_x(p) \\ I\end{bmatrix}$, $B(p) \equiv \eta \Gamma(p)$, and $\mathbb{V}(v) = D(\nu) D(p)^{\top}$. Normality of $\nu$ or $\epsilon'$ is not required in general.
This is a linear state-space model and if the priors and shocks are Gaussian, a marginal likelihood can be evaluated with classic methods such as a Kalman Filter. The output of the perturbation can be used manually, or in conjunction with DifferenceEquations.jl.
This package also solves for second order perturbations, with all of the same features. The canonical form of the second order solution implements state-space pruning as in Andreasen, Fernandez-Villaverde, and Rubio-Ramirez (2017). Given the same formulation of the problem, the model finds the canonical form,
$$ \hat{x}' = A_0(p) + A_1(p)\ \hat{x} + \hat{x}^{\top} A_2(p)\ \hat{x} + B(p) \epsilon' $$and
$$ \hat{y} = g_{\sigma\sigma} + g_x \ \hat{x} + \hat{x}^{\top}\ g_{xx}\\hat{x} $$and
$$ \hat{z} = C_0(p) + C_1(p)\ \hat{x} + \hat{x}^{\top} C_2(p)\ \hat{x} + \nu $$where $B(p) \equiv \eta \Gamma(p)$, $\mathbb{V}(v) = D(\nu) D(p)^{\top}$, and the $C_0,C_1,C_2,A_0, A_1, A_2$ all reflect the state space pruning.
All of the above use standard solution methods. The primary contribution of this package is that all of these model elements are differentiable. Hence, these gradeients can be composed for use in applications such as optimization, gradient-based estimation methods, and with DifferenceEquations.jl which provides differentiable simulations and likelihoods for state-space models. That is, if we think of a perturbation solver mapping $p$ to solutions (e.g. in first order $\mathbf{P}(p) \to (A, B, C, D)$, then we can find the gradients $\partial_p \mathbf{P}(p), \partial_p A(p)$ etc. Or, when these gradients are available for use with reverse-mode auto-differentiation, it can take "wobbles" in $A, B, C, D$ and go back to the "wiggles" of the underlying $p$ through $\mathbf{P}$. See ChainRules.jl for more details on AD.
Models are defined using a Dynare-style DSL using Symbolics.jl. The list of primitives are:
This example implements the model primitives to build a simple version of the RBC model with two observables.
∞ = Inf
@variables α, β, ρ, δ, σ, Ω_1
@variables t::Integer, k(..), z(..), c(..), q(..)
x = [k, z] # states
y = [c, q] # controls
p = [α, β, ρ, δ, σ, Ω_1] # parameters
H = [1 / c(t) - (β / c(t + 1)) * (α * exp(z(t + 1)) * k(t + 1)^(α - 1) + (1 - δ)),
c(t) + k(t + 1) - (1 - δ) * k(t) - q(t),
q(t) - exp(z(t)) * k(t)^α,
z(t + 1) - ρ * z(t)] # system of model equations
# analytic solutions for the steady state. Could pass initial values and run solver and use initial values with steady_states_iv
steady_states = [k(∞) ~ (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
z(∞) ~ 0,
c(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1)) -
δ * (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
q(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1))]
Γ = [σ;;] # matrix for the 1 shock. The [;;] notation just makes it a matrix rather than vector in julia
η = [0; -1;;] # η is n_x * n_ϵ matrix. The [;;] notation just makes it a matrix rather than vector in julia
# observation matrix. order is "y" then "x" variables, so [c,q,k,z] in this example
Q = [1.0 0 0 0; # select c as first "z" observable
0 0 1.0 0] # select k as second "z" observable
# diagonal cholesky of covariance matrix for observation noise (so these are standard deviations). Non-diagonal observation noise not currently supported
Ω = [Ω_1, Ω_1]
# Generates the files and includes if required. If the model is already created, then just loads
overwrite_model_cache = true
model_rbc = @make_and_include_perturbation_model("rbc_notebook_example", H, (; t, y, x, p, steady_states, Γ, Ω, η, Q, overwrite_model_cache)) # Convenience macro. Saves as ".function_cache/rbc_notebook_example.jl"
# After generation could just include the files directly, or add the following to prevent overwriting the models
# isdefined(Main, :rbc_notebook_example) || include(joinpath(pkgdir(DifferentiableStateSpaceModels),
# "test/generated_models/rbc_notebook_example.jl"))
# Alternatively, this is also the same example as a prebuilt example
# model_rbc = @include_example_module(DifferentiableStateSpaceModels.Examples.rbc_observables)
Perturbation Model: n_y = 2, n_x = 2, n_p = 6, n_ϵ = 1, n_z = 2 y = [:c, :q] x = [:k, :z] p = [:α, :β, :ρ, :δ, :σ, :Ω_1]
As commented above, the convenience macro will take this model, generate files, and include them. It called the function make_perturbation_model
with the associated options in that file. Once generated these can be included as normal .jl
files. At this point, there are no runtime generated functions though that functionality could be added later. The following queries some of the properties of the loaded model.
model_H_latex(model_rbc)
model_steady_states_latex(model_rbc) # this is the closed form steady state, could instead pass in the state_states_iv to give approximate initial conditions for optimizer instead
@show model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z
(model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z) = (2, 2, 1, 2)
(2, 2, 1, 2)
Given a model definition, you can solve the expectational difference equation for a given set of parameters. All of the p
defined above must be in either the p_d
or the p_f
arguments below.
The distinction between these is that the p_d
parameters will be available for differentiation where the p_f
will be fixed. That is, for any parameters in the p_d
the model will support calculation of gradients of the perturbation solution. The only downside with having extra values in p_d
vs. p_f
is extra computation. Otherwise, pass in nothing
for p_f
if you are differentiating all of the parameters
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc # ensure notebook executed above
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC
sol_2 = generate_perturbation(model_rbc, p_d, p_f, Val(2)); # Solution to the second-order RBC
@show sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f) # the final call checks that the analytically provided steady-state solution is correct
(sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f)) = (:Success, :Success, true)
(:Success, :Success, true)
The perturbation solution (in the canonical form described in the top section) can be queried from the resulting solution. A few examples for the first order solution are below,
@show sol.y, sol.x # steady states y_ss and x_ss These are the values such that y ≡ ŷ + sol.y and x ≡ x̂ + sol.x
@show sol.g_x # the policy
@show sol.A, sol.B # the evolution equation of the state, so that x̂' = A x̂ + B ϵ
@show sol.C, sol.D; # the evolution equation of the state, so that z = C x̂ + ν with variance of ν as D D'.
@show sol.x_ergodic_var; # covariance matrix of the ergodic distribution of x̂, which is mean zero since x̂ ≡ x - x_ss
(sol.y, sol.x) = ([5.936252888048732, 6.884057971014497], [47.39025414828823, 0.0]) sol.g_x = [0.09579643002416627 0.6746869652586192; 0.07263157894736873 6.884057971014506] (sol.A, sol.B) = ([0.9568351489232028 6.2093710057558855; 1.5076865909646354e-18 0.20000000000000004], [0.0; -0.01;;]) (sol.C, sol.D) = ([0.09579643002416627 0.6746869652586192; 1.0 0.0], [0.0001, 0.0001]) sol.x_ergodic_var = [0.07005411173180152 0.00015997603451513398; 0.00015997603451513398 0.00010416666666666667]
In addition, you can query some metadata about the solution to aid in debugging and generic programming
@show sol.Q, sol.η, sol.Γ; # various matrices for loading of shocks and observations
@show sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p # sizes of model
@show sol.x_symbols, sol.y_symbols, sol.p_symbols; # the order of symbols for the x vector
(sol.Q, sol.η, sol.Γ) = ([1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0], [0; -1;;], [0.01;;]) (sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p) = (2, 2, 1, 2, 6) (sol.x_symbols, sol.y_symbols, sol.p_symbols) = ([:k, :z], [:c, :q], [:α, :β, :ρ, :δ, :σ, :Ω_1])
The core feature of this library is to enable gradients of the perturbation solutions with respect to parameters (i.e., anything in the p_d
vector). To show this, we will construct a function which uses the resulting law of motion and finds the gradient of the results with respect to this value.
function IRF(p_d, ϵ_0; m, p_f, steps)
sol = generate_perturbation(m, p_d, p_f) # First-order perturbation by default, pass Val(2) as additional argument to do 2nd order.
x = sol.B * ϵ_0 # start after applying impulse with the model's shock η and Γ(p)
for _ in 1:steps
# A note: you cannot use mutating expressions here with most AD code. i.e. x .= sol.A * x wouldn't work
# For more elaborate simuluations, you would want to use DifferenceEquations.jl in practice
x = sol.A * x # iterate forward using first-order observation equation
end
return [0, 1]' * sol.C * x # choose the second observable using the model's C observation equation since first-order
end
m = model_rbc # ensure notebook executed above
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01) # not differentiated
p_d = (α=0.5, β=0.95) # different parameters
steps = 10 # steps ahead to forecast
ϵ_0 = [1.0] # shock size
IRF(p_d, ϵ_0; m, p_f, steps) # Function works on its own, calculating perturbation
-0.052773688889493325
However, the real benefit is that this function can itself be differentiated, to find gradients with respect to the deep parameters p_d
and the shock ϵ_0
# Using the Zygote auto-differentiation library already loaded above
p_d = (α=0.5, β=0.95) # different parameters
ϵ_0 = [1.0] # shock size
IRF_grad = gradient((p_d, ϵ_0) -> IRF(p_d, ϵ_0; m, p_f, steps), p_d, ϵ_0) # "closes" over the m, p_f, and steps to create a new function, and differentiates it with respect to other arguments
((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])
println("d/dα IRF(p_d, ϵ0) = $(IRF_grad[1].α), d/dβ IRF(p_d, ϵ0) = $(IRF_grad[1].β), d/dϵ0 IRF(p_d, ϵ0) = $(IRF_grad[2])")
d/dα IRF(p_d, ϵ0) = -0.5810083808068425, d/dβ IRF(p_d, ϵ0) = -1.182312351872585, d/dϵ0 IRF(p_d, ϵ0) = [-0.05277368888949331]
The calculation of the gradients of the perturbation solution is slow relative to the perturbation itself, and both can allocate a significant amount of memory. If you intend to keep calculating a perturbation a large number of times (e.g., during estimation), then you may want to preallocate this memory and reuse it by passing in a cache.
function IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)
sol = generate_perturbation(m, p_d, p_f; cache) # only difference is passing in a pre-allocated cache to the perturbation solver
x = sol.B * ϵ_0
for _ in 1:steps
x = sol.A * x
end
return [0, 1]' * sol.C * x
end
# To call it
m = model_rbc # ensure notebook executed above
p_d = (α=0.5, β=0.95) # Differentiated parameters
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01)
steps = 10 # steps ahead to forecast
ϵ_0 = [1.0] # shock size
cache = SolverCache(m, Val(1), p_d) # the p_d passed into the cache just reuses the symbols, and the values in p_d are unused
@show IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)
gradient((p_d, ϵ_0) -> IRF_cache(p_d, ϵ_0; m, p_f, steps, cache), p_d, ϵ_0) # as before, do not try to differentiate with respect to the cache values itself.
IRF_cache(p_d, ϵ_0; m, p_f, steps, cache) = -0.052773688889493325
((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])
# However, in a problem this small you will not see much of a difference in performance
@btime IRF($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps)
@btime IRF_cache($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps, cache = $cache)
64.608 μs (214 allocations: 86.59 KiB) 55.104 μs (125 allocations: 75.72 KiB)
-0.052773688889493325
The manual iteration of the state-space model from the perturbation solution is possible, but can be verbose and difficult to achieve efficiency for gradients. One benefit of this package is that it creates state-space models in a form consistent with DifferenceEquations.jl which can be easily simulated, visualized, and estimated.
To do this, we will calculate a perturbation solution then simulate it for various x_0
drawn from the ergodic solution.
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc # ensure notebook executed above
sol = generate_perturbation(m, p_d, p_f) # Solution to the first-order RBC
# Simulate T observations from a random initial condition
T = 20
# draw from ergodic distribution for the initial condition
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
plot(solve(problem))
┌ Info: Precompiling GR_jll [d2c73de3-f751-5644-a686-071e5b155ba9] └ @ Base loading.jl:1423
The LinearStateSpaceProblem
type is automatically constructed from the underlying perturbation. However, we can override any of these options, or pass in our own noise rather than simulate it for a particular experiment
noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution
problem = LinearStateSpaceProblem(sol, x_iv, (0, T); noise)
plot(solve(problem))
To demonstrate the composition of gradients between DifferenceEquations and DifferentiableStateSpaceModels lets adapt this function to simulate an impulse with fixed noise and look at the final observable
function last_observable(p_d, noise, x_iv; m, p_f, T)
sol = generate_perturbation(m, p_d, p_f)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T);noise, observables_noise = nothing) # removing observation noise
return solve(problem).z[end][2] # return 2nd argument of last observable
end
T = 100
noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc # ensure notebook executed above
last_observable(p_d, noise, x_iv; m, p_f, T)
-0.001039731618424578
And, as before, we can calculate gradients with respect to the underlying p_d
parameters, but also with respect to the noise which will demonstrate a key benefit of these methods, as they can let us do a joint likelihood of the latent variables in cases where they cannot be easily marginalized out (e.g., non-Gaussian or nonlinear). Note that the dimensionality of this gradient is over 100.
gradient((p_d, noise, x_iv) -> last_observable(p_d, noise, x_iv; m, p_f, T), p_d, noise, x_iv)
((α = -0.02350208860044909, β = -0.08002797800781031), [-0.0010397316184245775 -0.0010866361040296902 … -0.06209371005755886 0.0], [0.012125846203919335, 0.09948517579554432])
Finally, we can use a simple utility functions to investigate an IRF.
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc
sol = generate_perturbation(model_rbc, p_d, p_f)
ϵ0 = [1.0]
T = 10
sim = irf(sol, ϵ0, T)
plot(sim)
Finally, there are a variety of features of SciML which are supported. For example, parallel simulations of ensembles and associated summary statistics
# Simulate multiple trajectories with T observations
trajectories = 40
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
# Solve multiple trajectories and plot an ensemble
ensemble_results = solve(EnsembleProblem(problem), DirectIteration(), EnsembleThreads();
trajectories)
summ = EnsembleSummary(ensemble_results) # see SciML documentation. Calculates median and other quantiles automatically.
summ.med # median values for the "x" simulated ensembles
t: 0:10 u: 11-element Vector{Vector{Float64}}: [-0.06849616749168119, 0.002430490689003771] [-0.018232243727516317, 4.178362879834265e-5] [-0.006130734877711555, -0.0012801506033949567] [-0.029692362571749217, 0.0029167124862915397] [-0.01418677984731015, 8.880061768457687e-5] [-0.012168309195379916, 0.0003679002721679268] [-0.02903024314533037, -0.0017796657403380003] [-0.017594841603510616, -0.003123852451232536] [-0.042044233027366255, 9.676460362403633e-5] [-0.03947618577501135, 0.0007834629342420161] [-0.053775744337300646, 0.005120733305282458]
plot(summ, fillalpha= 0.2) # plots by default show the median and quantiles of both variables. Modifying transparency as an example
We can use the underlying state-space model to easily simulate states and observables
# Simulate T observations
T = 20
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC
x_iv = MvNormal(sol.x_ergodic_var) # draw initial conditions from the ergodic distribution
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
sim = solve(problem, DirectIteration())
ϵ = sim.W # store the underlying noise in the simulation
# Collapse to simulated observables as a matrix - as required by current DifferenceEquations.jl likelihood
# see https://github.com/SciML/DifferenceEquations.jl/issues/55 for direct support of this datastructure
z_rbc = hcat(sim.z...)
2×21 Matrix{Float64}: -0.0367643 -0.0463864 -0.0366824 … 0.0156731 -0.00214294 0.00295132 -0.172727 -0.257091 -0.311023 0.0203809 0.0373975 0.0446603
The combination of this package and DifferenceEquations enables gradient-based estimation methods such as NUTS/HMC. We will show this with Julia's Turing package, which provides a probabalistic programming lanuage (PPL) in the same spirit as Stan.
For gradients, we use the Zygote.jl reverse-mode auto-differentiation package, which is compatible with both DSSM and DifferenceEquations.
We will consider two likelihoods: a Kalman-Filter where we marginalize out the latent dimensions for linear-gaussian problems, and a joint distribution between the latents and the deep parameters. In both cases, gradients of the underlying likelihoods are provided by the packages.
using Turing
using Turing: @addlogprob!
Turing.setadbackend(:zygote); # Especially when we sample the latent noise, we will require high-dimensional gradients with reverse-mode AD
This case samples the deep parameters, calculates the state-space model from the first-order perturbation, then evaluates the likelihood using a Kalman Filter.
# Turing model definition
@model function rbc_kalman(z, m, p_f, settings)
α ~ Uniform(0.2, 0.8) # priors
β ~ Uniform(0.5, 0.99)
p_d = (; α, β)
T = size(z, 2)
# also passes in the p_f for fixed values
sol = generate_perturbation(m, p_d, p_f, Val(1); settings) # first-order perturbation
if !(sol.retcode == :Success) # if the perturbation failed, we want to return -infinity for the likelihood and resample
@addlogprob! -Inf
return
end
problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z) # utility constructs a LinearStateSpaceProblem from the sol.A, sol.B, etc.
@addlogprob! solve(problem, KalmanFilter()).logpdf # Linear-Gaussian so can marginalize with kalman filter. It should automatically choose the KalmanFilter in this case if not provided.
end
cache = SolverCache(model_rbc, Val(1), [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
z = z_rbc # simulated in previous steps
turing_model = rbc_kalman(z, model_rbc, p_f, settings) # passing observables from before
n_samples = 1000
n_adapts = 250
δ = 0.65
alg = NUTS(n_adapts,δ)
println("Sampling. Ignore any 'rejected due to numerical errors' warnings") # At this point, Turing can't turn off those warnings
chain_1_marginal = sample(turing_model, alg, n_samples; progress = true)
┌ Info: Found initial step size │ ϵ = 0.025 └ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188 ┌ Warning: The current proposal will be rejected due to numerical error(s). │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false) └ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47 Sampling: 7%|██▊ | ETA: 0:00:22┌ Warning: The current proposal will be rejected due to numerical error(s). │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true) └ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47 Sampling: 100%|█████████████████████████████████████████| Time: 0:00:17
Sampling. Ignore any 'rejected due to numerical errors' warnings
Chains MCMC chain (1000×14×1 Array{Float64, 3}): Iterations = 251:1:1250 Number of chains = 1 Samples per chain = 1000 Wall duration = 79.41 seconds Compute duration = 79.41 seconds parameters = α, β internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat e ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ α 0.4844 0.0207 0.0007 0.0010 450.9653 1.0088 ⋯ β 0.9506 0.0050 0.0002 0.0003 346.2868 1.0066 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 α 0.4432 0.4714 0.4840 0.4983 0.5260 β 0.9416 0.9470 0.9505 0.9537 0.9602
The second case instead draws all of the latents and evaluates the joint likelihood of all. This is slower than the the marginal case of a Kalman filter in cases where everything is linear-Gaussian, but it enables non-gaussian shocks and more flexibility. This approach is used for the 2nd order perturbations in the next case instead of marginalizing with a particle filter.
It is possible that the joint likelihood becomes more competitive with the Kalman filter for large models, but for now consider this a demonstration that sampling the high-dimensional latent variables gives the correct answer and is not unusably slow.
# Turing model definition
@model function rbc_1_joint(z, m, p_f, cache, settings)
α ~ Uniform(0.2, 0.8)
β ~ Uniform(0.5, 0.99)
p_d = (; α, β)
T = size(z, 2)
ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0)
ϵ = reshape(ϵ_draw, m.n_ϵ, T)
sol = generate_perturbation(m, p_d, p_f, Val(1); cache, settings)
if !(sol.retcode == :Success)
@addlogprob! -Inf
return
end
problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)
@addlogprob! solve(problem, DirectIteration()).logpdf # should choose DirectIteration() by default if not provided
end
cache = SolverCache(model_rbc, Val(1), [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
z = z_rbc # simulated in previous steps
turing_model = rbc_1_joint(z, model_rbc, p_f, cache, settings) # passing observables from before
n_samples = 300
n_adapts = 50
δ = 0.65
alg = NUTS(n_adapts,δ)
chain_1_joint = sample(turing_model, alg, n_samples; progress = true)
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
┌ Info: Found initial step size
│ ϵ = 0.00625
└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188
Sampling: 100%|█████████████████████████████████████████| Time: 0:04:25
Chains MCMC chain (300×35×1 Array{Float64, 3}): Iterations = 51:1:350 Number of chains = 1 Samples per chain = 300 Wall duration = 332.18 seconds Compute duration = 332.18 seconds parameters = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21] internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat e ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ α 0.5229 0.0181 0.0010 0.0022 69.6873 0.9972 ⋯ β 0.9477 0.0051 0.0003 0.0004 209.0060 0.9969 ⋯ ϵ_draw[1] 3.3707 0.5082 0.0293 0.0637 63.0585 0.9988 ⋯ ϵ_draw[2] 0.2689 0.2019 0.0117 0.0053 314.7188 0.9967 ⋯ ϵ_draw[3] 0.7577 0.2222 0.0128 0.0183 145.6377 0.9967 ⋯ ϵ_draw[4] -0.3051 0.2072 0.0120 0.0125 279.1289 0.9969 ⋯ ϵ_draw[5] 0.4963 0.2128 0.0123 0.0124 274.8647 0.9975 ⋯ ϵ_draw[6] 0.2234 0.2034 0.0117 0.0096 319.6311 1.0020 ⋯ ϵ_draw[7] 0.7579 0.2149 0.0124 0.0157 185.8446 0.9972 ⋯ ϵ_draw[8] -1.0655 0.2382 0.0138 0.0249 116.7922 0.9978 ⋯ ϵ_draw[9] 0.2374 0.2057 0.0119 0.0154 229.3082 0.9969 ⋯ ϵ_draw[10] 0.0883 0.1932 0.0112 0.0117 405.8351 0.9968 ⋯ ϵ_draw[11] 0.9973 0.2473 0.0143 0.0185 173.9337 0.9990 ⋯ ϵ_draw[12] -0.3812 0.2214 0.0128 0.0099 364.7857 0.9975 ⋯ ϵ_draw[13] 0.2082 0.2131 0.0123 0.0099 337.4811 0.9979 ⋯ ϵ_draw[14] -0.3477 0.1845 0.0107 0.0106 201.1109 0.9971 ⋯ ϵ_draw[15] -1.0082 0.2325 0.0134 0.0222 117.0747 0.9984 ⋯ ϵ_draw[16] -1.8721 0.3316 0.0191 0.0370 82.6932 0.9968 ⋯ ϵ_draw[17] 0.0170 0.1936 0.0112 0.0115 218.6334 0.9971 ⋯ ϵ_draw[18] -0.4615 0.2105 0.0122 0.0117 282.9331 0.9967 ⋯ ϵ_draw[19] -0.1628 0.1887 0.0109 0.0111 326.3768 0.9970 ⋯ ϵ_draw[20] -0.0303 0.2101 0.0121 0.0079 398.9178 0.9977 ⋯ ϵ_draw[21] -0.0619 0.9117 0.0526 0.1126 63.0430 0.9974 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 α 0.4872 0.5094 0.5236 0.5349 0.5554 β 0.9374 0.9445 0.9476 0.9510 0.9575 ϵ_draw[1] 2.4592 2.9460 3.3925 3.7234 4.3613 ϵ_draw[2] -0.1099 0.1333 0.2589 0.3936 0.6971 ϵ_draw[3] 0.3833 0.5889 0.7498 0.9022 1.2460 ϵ_draw[4] -0.7119 -0.4320 -0.2715 -0.1825 0.1024 ϵ_draw[5] 0.0757 0.3503 0.4837 0.6273 0.9723 ϵ_draw[6] -0.1415 0.0891 0.2006 0.3422 0.6636 ϵ_draw[7] 0.3246 0.6242 0.7338 0.8935 1.1931 ϵ_draw[8] -1.5653 -1.2191 -1.0338 -0.8900 -0.6837 ϵ_draw[9] -0.1407 0.0931 0.2308 0.3688 0.6677 ϵ_draw[10] -0.3329 -0.0311 0.0836 0.2166 0.4600 ϵ_draw[11] 0.5293 0.8480 0.9825 1.1291 1.5695 ϵ_draw[12] -0.8768 -0.5111 -0.3606 -0.2229 0.0133 ϵ_draw[13] -0.1808 0.0588 0.2068 0.3394 0.6536 ϵ_draw[14] -0.7252 -0.4677 -0.3344 -0.2255 0.0101 ϵ_draw[15] -1.4794 -1.1560 -0.9809 -0.8436 -0.6249 ϵ_draw[16] -2.5968 -2.0884 -1.8550 -1.6155 -1.2956 ϵ_draw[17] -0.3888 -0.1104 0.0171 0.1467 0.3788 ϵ_draw[18] -0.8963 -0.5928 -0.4580 -0.3050 -0.1086 ϵ_draw[19] -0.5132 -0.2866 -0.1720 -0.0235 0.2068 ϵ_draw[20] -0.4202 -0.1648 -0.0474 0.1009 0.3734 ϵ_draw[21] -1.8476 -0.6901 -0.0752 0.4894 2.0249
Finally, this calculates a second order perturbation and then samples the latents and deep parameters. Of course, the Kalman Filter would not provide the correct likelihood so the proper comparison is something like a particle filter.
# Turing model definition
@model function rbc_2_joint(z, m, p_f, cache, settings)
α ~ Uniform(0.2, 0.8)
β ~ Uniform(0.5, 0.99)
p_d = (; α, β)
T = size(z, 2)
ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0) # add noise to the estimation
ϵ = reshape(ϵ_draw, m.n_ϵ, T)
sol = generate_perturbation(m, p_d, p_f, Val(2); cache, settings)
if !(sol.retcode == :Success)
@addlogprob! -Inf
return
end
problem = QuadraticStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)
@addlogprob! solve(problem, DirectIteration()).logpdf
end
cache = SolverCache(model_rbc, Val(2), [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
z = z_rbc # simulated in previous steps
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
turing_model = rbc_2_joint(z, model_rbc, p_f, cache, settings) # passing observables from before
n_samples = 300
n_adapts = 50
δ = 0.65
alg = NUTS(n_adapts,δ)
chain_2_joint = sample(turing_model, alg, n_samples; progress = true)
┌ Info: Found initial step size
│ ϵ = 0.0125
└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188
Sampling: 100%|█████████████████████████████████████████| Time: 0:03:44
Chains MCMC chain (300×35×1 Array{Float64, 3}): Iterations = 51:1:350 Number of chains = 1 Samples per chain = 300 Wall duration = 273.11 seconds Compute duration = 273.11 seconds parameters = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21] internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size Summary Statistics parameters mean std naive_se mcse ess rhat e ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ α 0.5250 0.0180 0.0010 0.0021 48.1201 1.0333 ⋯ β 0.9477 0.0057 0.0003 0.0003 809.0493 0.9977 ⋯ ϵ_draw[1] 3.3547 0.5340 0.0308 0.0701 38.0331 1.0474 ⋯ ϵ_draw[2] 0.2523 0.1901 0.0110 0.0081 267.7918 1.0010 ⋯ ϵ_draw[3] 0.7605 0.2202 0.0127 0.0166 138.2808 1.0039 ⋯ ϵ_draw[4] -0.3092 0.2105 0.0122 0.0153 212.3421 0.9968 ⋯ ϵ_draw[5] 0.4908 0.2067 0.0119 0.0143 248.2859 0.9973 ⋯ ϵ_draw[6] 0.2183 0.2034 0.0117 0.0152 236.9790 1.0027 ⋯ ϵ_draw[7] 0.7239 0.2318 0.0134 0.0221 125.7547 1.0008 ⋯ ϵ_draw[8] -1.0306 0.2295 0.0132 0.0222 92.2911 1.0086 ⋯ ϵ_draw[9] 0.2083 0.2068 0.0119 0.0103 302.6394 0.9968 ⋯ ϵ_draw[10] 0.0971 0.1909 0.0110 0.0130 259.3403 1.0017 ⋯ ϵ_draw[11] 0.9924 0.2447 0.0141 0.0209 147.6565 1.0076 ⋯ ϵ_draw[12] -0.3877 0.2087 0.0121 0.0134 271.4525 0.9967 ⋯ ϵ_draw[13] 0.2118 0.2060 0.0119 0.0142 255.3336 1.0000 ⋯ ϵ_draw[14] -0.3364 0.2150 0.0124 0.0112 232.4814 0.9986 ⋯ ϵ_draw[15] -1.0031 0.2613 0.0151 0.0257 62.9911 1.0307 ⋯ ϵ_draw[16] -1.7959 0.3273 0.0189 0.0314 79.7611 1.0077 ⋯ ϵ_draw[17] 0.0039 0.1958 0.0113 0.0146 207.4630 1.0076 ⋯ ϵ_draw[18] -0.4767 0.1978 0.0114 0.0138 205.4055 0.9997 ⋯ ϵ_draw[19] -0.1388 0.2154 0.0124 0.0199 223.3459 0.9970 ⋯ ϵ_draw[20] -0.0447 0.2332 0.0135 0.0188 206.5141 1.0035 ⋯ ϵ_draw[21] 0.0016 0.7230 0.0417 0.0844 43.5613 1.0244 ⋯ 1 column omitted Quantiles parameters 2.5% 25.0% 50.0% 75.0% 97.5% Symbol Float64 Float64 Float64 Float64 Float64 α 0.4921 0.5140 0.5252 0.5375 0.5587 β 0.9349 0.9442 0.9479 0.9513 0.9580 ϵ_draw[1] 2.4768 2.9778 3.2626 3.6505 4.5892 ϵ_draw[2] -0.1108 0.1226 0.2431 0.3856 0.6346 ϵ_draw[3] 0.3850 0.5965 0.7473 0.9134 1.2168 ϵ_draw[4] -0.7323 -0.4438 -0.3015 -0.1730 0.0512 ϵ_draw[5] 0.0843 0.3569 0.4806 0.6170 0.9349 ϵ_draw[6] -0.1683 0.0915 0.2181 0.3569 0.6012 ϵ_draw[7] 0.3236 0.5608 0.7098 0.8609 1.2712 ϵ_draw[8] -1.5092 -1.1558 -1.0130 -0.8786 -0.6073 ϵ_draw[9] -0.1816 0.0750 0.2114 0.3291 0.6813 ϵ_draw[10] -0.2711 -0.0368 0.0946 0.2235 0.4602 ϵ_draw[11] 0.5584 0.8224 0.9656 1.1663 1.4695 ϵ_draw[12] -0.7866 -0.5384 -0.3778 -0.2229 -0.0195 ϵ_draw[13] -0.1534 0.0597 0.2070 0.3587 0.6425 ϵ_draw[14] -0.7686 -0.4789 -0.3317 -0.1854 0.0814 ϵ_draw[15] -1.5411 -1.1619 -0.9703 -0.8259 -0.5259 ϵ_draw[16] -2.5215 -1.9819 -1.7608 -1.5387 -1.2839 ϵ_draw[17] -0.3643 -0.1147 -0.0020 0.1181 0.3983 ϵ_draw[18] -0.8702 -0.5914 -0.4742 -0.3403 -0.1091 ϵ_draw[19] -0.5752 -0.2878 -0.1399 0.0075 0.3161 ϵ_draw[20] -0.5372 -0.1825 -0.0365 0.0931 0.4135 ϵ_draw[21] -1.4457 -0.5054 0.0329 0.5084 1.3862
Below is a graph which shows how well the high-dimensional sampled latents do relative to the pseudo-true latent values.
symbol_to_int(s) = parse(Int, string(s)[9:end-1])
ϵ_chain = sort(chain_1_joint[:, [Symbol("ϵ_draw[$a]") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))
tmp = describe(ϵ_chain)
ϵ_mean = tmp[1][:, 2]
ϵ_std = tmp[1][:, 3]
plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label="Posterior mean", title = "First-Order Joint: Estimated Latents")
plot!(ϵ', label="True values")
ϵ_chain = sort(chain_2_joint[:, [Symbol("ϵ_draw[$a]") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))
tmp = describe(ϵ_chain)
ϵ_mean = tmp[1][:, 2]
ϵ_std = tmp[1][:, 3]
plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label="Posterior mean", title = "Second-Order Joint: Estimated Latents")
plot!(ϵ', label="True values")