Notebook

Setup model and install packages¶

Note: This package is a proof of concept. While the code will remain working with proper use of a Julia manifest, we can't guarantee ongoing support and maintenance.

Remove comment below to install the exact version of packages listed in the manifest.

In [1]:

# import Pkg; Pkg.instantiate()

The core package is DifferentiableStateSpaceModels.jl where DifferenceEquations.jl provides compatible functionality for simulating and calculating likelihoods

In [2]:

using DifferentiableStateSpaceModels, DifferenceEquations, LinearAlgebra, Zygote, Distributions, Plots, DiffEqBase, Symbolics, BenchmarkTools

┌ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
└ @ Base loading.jl:1423

Model Class¶

The model follows Schmitt-Grohe and Uribe (2004) timing convention. The system takes a nonlinear expectational difference equation including all first-order conditions for decisions and the system evolution equations,

$$ \mathbb{E}_{t}\mathcal{H}\left(y',y,x',x;p\right)=0 $$

where $y$ are the control variables, $x$ are the states, and $p$ is a vector of deep parameters of interest. Expectations are taken over forward-looking variables and an underlying random process $\epsilon'$.

In addition, we consider an observation equation - which might be noisy, for $$ z = Q \cdot \begin{bmatrix}y &x\end{bmatrix}^{\top} + \nu $$ where $\nu$ may or may not be normally distributed but $\mathbb{E}(\nu) = 0$ and $\mathbb{V}(\nu) = \Omega(p) \Omega(p)^{\top}$.

Assume that there is a non-stochastic steady state of this problem as $y_{ss}, x_{ss}$.

Perturbation Solution¶

Define the deviation from the non-stochastic steady state as $\hat{x} \equiv x - x_{ss}, \hat{y} \equiv y - y_{ss},$ and $\hat{z} \equiv z - z_{ss}$.

The solution finds the first or second order perturbation around that non-stochastic steady state, and yields

$$ x' = h(x; p) + \eta \ \Gamma(p)\ \epsilon' $$

where $\eta$ describes how shocks affect the law of motion and $\mathbb{E}(\epsilon') = 0$. Frequently this would be organized such that $\mathbb{V}(\epsilon)= I$, but that is not required. In addition, it could instead be interpreted as for $x' = h(x; p) + \eta \ \epsilon'$ with $\mathbb{V}(\epsilon') = \Gamma(p) \Gamma(p)^{\top}$.

and with the policy equation,

$$ y = g(x; p) $$

and finally, substitution in for the observation equation

$$ z= Q \begin{bmatrix} g(x;p) \\ x\end{bmatrix} + \nu $$

First Order Solutions¶

Perturbation approximates the above model equations, where $h$ and $g$ are not available explicitly, by a Taylor expansion around the steady state. For example, in the case of the 1st order model the solution finds

$$ \hat{x}' = A(p)\ \hat{x} + B(p) \epsilon' $$

and

$$ \hat{y} = g_x(p) \ \hat{x} $$

and

$$ \hat{z} = C(p)\ \hat{x} + \nu $$

where $C(p)\equiv Q \begin{bmatrix} g_x(p) \\ I\end{bmatrix}$, $B(p) \equiv \eta \Gamma(p)$, and $\mathbb{V}(v) = D(\nu) D(p)^{\top}$. Normality of $\nu$ or $\epsilon'$ is not required in general.

This is a linear state-space model and if the priors and shocks are Gaussian, a marginal likelihood can be evaluated with classic methods such as a Kalman Filter. The output of the perturbation can be used manually, or in conjunction with DifferenceEquations.jl.

Second Order Solutions¶

This package also solves for second order perturbations, with all of the same features. The canonical form of the second order solution implements state-space pruning as in Andreasen, Fernandez-Villaverde, and Rubio-Ramirez (2017). Given the same formulation of the problem, the model finds the canonical form,

$$ \hat{x}' = A_0(p) + A_1(p)\ \hat{x} + \hat{x}^{\top} A_2(p)\ \hat{x} + B(p) \epsilon' $$

and

$$ \hat{y} = g_{\sigma\sigma} + g_x \ \hat{x} + \hat{x}^{\top}\ g_{xx}\\hat{x} $$

and

$$ \hat{z} = C_0(p) + C_1(p)\ \hat{x} + \hat{x}^{\top} C_2(p)\ \hat{x} + \nu $$

where $B(p) \equiv \eta \Gamma(p)$, $\mathbb{V}(v) = D(\nu) D(p)^{\top}$, and the $C_0,C_1,C_2,A_0, A_1, A_2$ all reflect the state space pruning.

Gradients¶

All of the above use standard solution methods. The primary contribution of this package is that all of these model elements are differentiable. Hence, these gradeients can be composed for use in applications such as optimization, gradient-based estimation methods, and with DifferenceEquations.jl which provides differentiable simulations and likelihoods for state-space models. That is, if we think of a perturbation solver mapping $p$ to solutions (e.g. in first order $\mathbf{P}(p) \to (A, B, C, D)$, then we can find the gradients $\partial_p \mathbf{P}(p), \partial_p A(p)$ etc. Or, when these gradients are available for use with reverse-mode auto-differentiation, it can take "wobbles" in $A, B, C, D$ and go back to the "wiggles" of the underlying $p$ through $\mathbf{P}$. See ChainRules.jl for more details on AD.

Model Primitives¶

Models are defined using a Dynare-style DSL using Symbolics.jl. The list of primitives are:

The list of variables for the controls $y$, state $x$, and deep parameters $p$.
The set of equations $H$ as a function of $p, y(t), y(t+1), x(t),$ and $x(t+1)$. No $t-1$ timing is allowed.
The loading of shocks $\eta$ as a fixed matrix of constants
The shock covariance Cholesky factor $\Gamma$ as a function of parameters $p$
The observation equation $Q$ as a fixed matrix.
The Cholesky factor of the observation errors, $\Omega$ as a function of parameters $p$. At this point only a diagonal matrix is support.
Either the steady state equations for all of $y$ and $x$ in closed form as a function of $p$, or initial conditions for the nonlinear solution to solve for the steady state as functions of $p$

RBC Example¶

This example implements the model primitives to build a simple version of the RBC model with two observables.

In [3]:

∞ = Inf
@variables α, β, ρ, δ, σ, Ω_1
@variables t::Integer, k(..), z(..), c(..), q(..)

x = [k, z] # states
y = [c, q] # controls
p = [α, β, ρ, δ, σ, Ω_1] # parameters

H = [1 / c(t) - (β / c(t + 1)) * (α * exp(z(t + 1)) * k(t + 1)^(α - 1) + (1 - δ)),
     c(t) + k(t + 1) - (1 - δ) * k(t) - q(t),
     q(t) - exp(z(t)) * k(t)^α,
     z(t + 1) - ρ * z(t)]  # system of model equations

# analytic solutions for the steady state.  Could pass initial values and run solver and use initial values with steady_states_iv
steady_states = [k(∞) ~ (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
                 z(∞) ~ 0,
                 c(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1)) -
                        δ * (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
                 q(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1))]


Γ = [σ;;] # matrix for the 1 shock.  The [;;] notation just makes it a matrix rather than vector in julia
η = [0; -1;;] # η is n_x * n_ϵ matrix.  The [;;] notation just makes it a matrix rather than vector in julia

# observation matrix.  order is "y" then "x" variables, so [c,q,k,z] in this example
Q = [1.0 0  0   0; # select c as first "z" observable
     0   0  1.0 0] # select k as second "z" observable

# diagonal cholesky of covariance matrix for observation noise (so these are standard deviations).  Non-diagonal observation noise not currently supported
Ω = [Ω_1, Ω_1]

# Generates the files and includes if required.  If the model is already created, then just loads
overwrite_model_cache  = true
model_rbc = @make_and_include_perturbation_model("rbc_notebook_example", H, (; t, y, x, p, steady_states, Γ, Ω, η, Q, overwrite_model_cache)) # Convenience macro.  Saves as ".function_cache/rbc_notebook_example.jl"

# After generation could just include the files directly, or add the following to prevent overwriting the models
# isdefined(Main, :rbc_notebook_example) || include(joinpath(pkgdir(DifferentiableStateSpaceModels),
#                                              "test/generated_models/rbc_notebook_example.jl"))
# Alternatively, this is also the same example as a prebuilt example
# model_rbc = @include_example_module(DifferentiableStateSpaceModels.Examples.rbc_observables)

Out[3]:

Perturbation Model: n_y = 2, n_x = 2, n_p = 6, n_ϵ = 1, n_z = 2
 y = [:c, :q] 
 x = [:k, :z] 
 p = [:α, :β, :ρ, :δ, :σ, :Ω_1]

As commented above, the convenience macro will take this model, generate files, and include them. It called the function make_perturbation_model with the associated options in that file. Once generated these can be included as normal .jl files. At this point, there are no runtime generated functions though that functionality could be added later. The following queries some of the properties of the loaded model.

In [4]:

model_H_latex(model_rbc)

Out[4]:

$\begin{equation} \left[ \begin{array}{c} \frac{ - \beta \left( 1 - \delta + \left( k\left( 1 + t \right) \right)^{-1 + \alpha} \alpha e^{z\left( 1 + t \right)} \right)}{c\left( 1 + t \right)} + \frac{1}{c\left( t \right)} \\ - q\left( t \right) - \left( 1 - \delta \right) k\left( t \right) + c\left( t \right) + k\left( 1 + t \right) \\ - \left( k\left( t \right) \right)^{\alpha} e^{z\left( t \right)} + q\left( t \right) \\ - \rho z\left( t \right) + z\left( 1 + t \right) \\ \end{array} \right] \end{equation} $

In [5]:

model_steady_states_latex(model_rbc)  # this is the closed form steady state, could instead pass in the state_states_iv to give approximate initial conditions for optimizer instead

Out[5]:

$\begin{align} k\left( \infty \right) =& \left( \frac{-1 + \delta + \frac{1}{\beta}}{\alpha} \right)^{\frac{1}{-1 + \alpha}} \\ z\left( \infty \right) =& 0 \\ c\left( \infty \right) =& \left( \frac{-1 + \delta + \frac{1}{\beta}}{\alpha} \right)^{\frac{\alpha}{-1 + \alpha}} - \left( \frac{-1 + \delta + \frac{1}{\beta}}{\alpha} \right)^{\frac{1}{-1 + \alpha}} \delta \\ q\left( \infty \right) =& \left( \frac{-1 + \delta + \frac{1}{\beta}}{\alpha} \right)^{\frac{\alpha}{-1 + \alpha}} \end{align} $

In [6]:

@show model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z

(model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z) = (2, 2, 1, 2)

Out[6]:

(2, 2, 1, 2)

Solving Perturbations¶

Given a model definition, you can solve the expectational difference equation for a given set of parameters. All of the p defined above must be in either the p_d or the p_f arguments below.

The distinction between these is that the p_d parameters will be available for differentiation where the p_f will be fixed. That is, for any parameters in the p_d the model will support calculation of gradients of the perturbation solution. The only downside with having extra values in p_d vs. p_f is extra computation. Otherwise, pass in nothing for p_f if you are differentiating all of the parameters

In [7]:

p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc  # ensure notebook executed above
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC
sol_2 = generate_perturbation(model_rbc, p_d, p_f, Val(2)); # Solution to the second-order RBC
@show sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f) # the final call checks that the analytically provided steady-state solution is correct

(sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f)) = (:Success, :Success, true)

Out[7]:

(:Success, :Success, true)

The perturbation solution (in the canonical form described in the top section) can be queried from the resulting solution. A few examples for the first order solution are below,

In [8]:

@show sol.y, sol.x  # steady states y_ss and x_ss  These are the values such that y ≡ ŷ + sol.y and x ≡ x̂ + sol.x
@show sol.g_x # the policy
@show sol.A, sol.B # the evolution equation of the state, so that x̂' = A x̂ + B ϵ
@show sol.C, sol.D; # the evolution equation of the state, so that z = C x̂ + ν  with variance of ν as D D'.
@show sol.x_ergodic_var; # covariance matrix of the ergodic distribution of x̂, which is mean zero since x̂ ≡ x - x_ss

(sol.y, sol.x) = ([5.936252888048732, 6.884057971014497], [47.39025414828823, 0.0])
sol.g_x = [0.09579643002416627 0.6746869652586192; 0.07263157894736873 6.884057971014506]
(sol.A, sol.B) = ([0.9568351489232028 6.2093710057558855; 1.5076865909646354e-18 0.20000000000000004], [0.0; -0.01;;])
(sol.C, sol.D) = ([0.09579643002416627 0.6746869652586192; 1.0 0.0], [0.0001, 0.0001])
sol.x_ergodic_var = [0.07005411173180152 0.00015997603451513398; 0.00015997603451513398 0.00010416666666666667]

In addition, you can query some metadata about the solution to aid in debugging and generic programming

In [9]:

@show sol.Q, sol.η, sol.Γ; # various matrices for loading of shocks and observations
@show sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p # sizes of model
@show sol.x_symbols, sol.y_symbols, sol.p_symbols; # the order of symbols for the x vector

(sol.Q, sol.η, sol.Γ) = ([1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0], [0; -1;;], [0.01;;])
(sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p) = (2, 2, 1, 2, 6)
(sol.x_symbols, sol.y_symbols, sol.p_symbols) = ([:k, :z], [:c, :q], [:α, :β, :ρ, :δ, :σ, :Ω_1])

Functions of Perturbation Solutions (and their Derivatives)¶

The core feature of this library is to enable gradients of the perturbation solutions with respect to parameters (i.e., anything in the p_d vector). To show this, we will construct a function which uses the resulting law of motion and finds the gradient of the results with respect to this value.

In [10]:

function IRF(p_d, ϵ_0; m, p_f, steps)
    sol = generate_perturbation(m, p_d, p_f) # First-order perturbation by default, pass Val(2) as additional argument to do 2nd order.
    x = sol.B * ϵ_0 # start after applying impulse with the model's shock η and Γ(p)
    for _ in 1:steps
        # A note: you cannot use mutating expressions here with most AD code.  i.e. x .= sol.A * x  wouldn't work
        # For more elaborate simuluations, you would want to use DifferenceEquations.jl in practice
        x = sol.A * x # iterate forward using first-order observation equation        
    end    
    return [0, 1]' * sol.C * x # choose the second observable using the model's C observation equation since first-order
end

m = model_rbc  # ensure notebook executed above
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01) # not differentiated 
p_d = (α=0.5, β=0.95) # different parameters
steps = 10 # steps ahead to forecast
ϵ_0 = [1.0] # shock size
IRF(p_d, ϵ_0; m, p_f, steps) # Function works on its own, calculating perturbation

Out[10]:

-0.052773688889493325

However, the real benefit is that this function can itself be differentiated, to find gradients with respect to the deep parameters p_d and the shock ϵ_0

In [11]:

# Using the Zygote auto-differentiation library already loaded above
p_d = (α=0.5, β=0.95) # different parameters
ϵ_0 = [1.0] # shock size
IRF_grad = gradient((p_d, ϵ_0) -> IRF(p_d, ϵ_0; m, p_f, steps), p_d, ϵ_0) # "closes" over the m, p_f, and steps to create a new function, and differentiates it with respect to other arguments

Out[11]:

((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])

In [12]:

println("d/dα IRF(p_d, ϵ0) = $(IRF_grad[1].α), d/dβ IRF(p_d, ϵ0) = $(IRF_grad[1].β), d/dϵ0 IRF(p_d, ϵ0) = $(IRF_grad[2])")

d/dα IRF(p_d, ϵ0) = -0.5810083808068425, d/dβ IRF(p_d, ϵ0) = -1.182312351872585, d/dϵ0 IRF(p_d, ϵ0) = [-0.05277368888949331]

Using Solution Cache¶

The calculation of the gradients of the perturbation solution is slow relative to the perturbation itself, and both can allocate a significant amount of memory. If you intend to keep calculating a perturbation a large number of times (e.g., during estimation), then you may want to preallocate this memory and reuse it by passing in a cache.

In [13]:

function IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)
    sol = generate_perturbation(m, p_d, p_f; cache)  # only difference is passing in a pre-allocated cache to the perturbation solver
    x = sol.B * ϵ_0
    for _ in 1:steps
        x = sol.A * x
    end    
    return [0, 1]' * sol.C * x
end

# To call it
m = model_rbc  # ensure notebook executed above
p_d = (α=0.5, β=0.95)  # Differentiated parameters
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01)
steps = 10 # steps ahead to forecast
ϵ_0 = [1.0] # shock size
cache = SolverCache(m, Val(1), p_d)   # the p_d passed into the cache just reuses the symbols, and the values in p_d are unused
@show IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)
gradient((p_d, ϵ_0) -> IRF_cache(p_d, ϵ_0; m, p_f, steps, cache), p_d, ϵ_0)  # as before, do not try to differentiate with respect to the cache values itself.

IRF_cache(p_d, ϵ_0; m, p_f, steps, cache) = -0.052773688889493325

Out[13]:

((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])

In [14]:

# However, in a problem this small you will not see much of a difference in performance
@btime IRF($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps)
@btime IRF_cache($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps, cache = $cache)

  64.608 μs (214 allocations: 86.59 KiB)
  55.104 μs (125 allocations: 75.72 KiB)

Out[14]:

-0.052773688889493325

Simulating with DifferenceEquations.jl¶

The manual iteration of the state-space model from the perturbation solution is possible, but can be verbose and difficult to achieve efficiency for gradients. One benefit of this package is that it creates state-space models in a form consistent with DifferenceEquations.jl which can be easily simulated, visualized, and estimated.

To do this, we will calculate a perturbation solution then simulate it for various x_0 drawn from the ergodic solution.

In [15]:

p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc  # ensure notebook executed above
sol = generate_perturbation(m, p_d, p_f) # Solution to the first-order RBC

# Simulate T observations from a random initial condition
T = 20

# draw from ergodic distribution for the initial condition
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
plot(solve(problem))

Out[15]:

┌ Info: Precompiling GR_jll [d2c73de3-f751-5644-a686-071e5b155ba9]
└ @ Base loading.jl:1423

The LinearStateSpaceProblem type is automatically constructed from the underlying perturbation. However, we can override any of these options, or pass in our own noise rather than simulate it for a particular experiment

In [16]:

noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0]  # can pass in a single value rather than a distribution 
problem = LinearStateSpaceProblem(sol, x_iv, (0, T); noise)
plot(solve(problem))

Out[16]:

To demonstrate the composition of gradients between DifferenceEquations and DifferentiableStateSpaceModels lets adapt this function to simulate an impulse with fixed noise and look at the final observable

In [17]:

function last_observable(p_d, noise, x_iv; m, p_f, T)
    sol = generate_perturbation(m, p_d, p_f)
    problem = LinearStateSpaceProblem(sol, x_iv, (0, T);noise, observables_noise = nothing)  # removing observation noise
    return solve(problem).z[end][2]  # return 2nd argument of last observable
end
T = 100
noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0]  # can pass in a single value rather than a distribution
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc  # ensure notebook executed above
last_observable(p_d, noise, x_iv; m, p_f, T)

Out[17]:

-0.001039731618424578

And, as before, we can calculate gradients with respect to the underlying p_d parameters, but also with respect to the noise which will demonstrate a key benefit of these methods, as they can let us do a joint likelihood of the latent variables in cases where they cannot be easily marginalized out (e.g., non-Gaussian or nonlinear). Note that the dimensionality of this gradient is over 100.

In [18]:

gradient((p_d, noise, x_iv) -> last_observable(p_d, noise, x_iv; m, p_f, T), p_d, noise, x_iv)

Out[18]:

((α = -0.02350208860044909, β = -0.08002797800781031), [-0.0010397316184245775 -0.0010866361040296902 … -0.06209371005755886 0.0], [0.012125846203919335, 0.09948517579554432])

Finally, we can use a simple utility functions to investigate an IRF.

In [19]:

p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc
sol = generate_perturbation(model_rbc, p_d, p_f)
ϵ0 = [1.0]
T = 10
sim = irf(sol, ϵ0, T)
plot(sim)

Out[19]:

Sneak Peak at SciML Compatible Functionality¶

Finally, there are a variety of features of SciML which are supported. For example, parallel simulations of ensembles and associated summary statistics

In [20]:

# Simulate multiple trajectories with T observations
trajectories = 40
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))

# Solve multiple trajectories and plot an ensemble
ensemble_results = solve(EnsembleProblem(problem), DirectIteration(), EnsembleThreads();
                 trajectories)
summ = EnsembleSummary(ensemble_results)  # see SciML documentation.  Calculates median and other quantiles automatically.
summ.med # median values for the "x" simulated ensembles

Out[20]:

t: 0:10
u: 11-element Vector{Vector{Float64}}:
 [-0.06849616749168119, 0.002430490689003771]
 [-0.018232243727516317, 4.178362879834265e-5]
 [-0.006130734877711555, -0.0012801506033949567]
 [-0.029692362571749217, 0.0029167124862915397]
 [-0.01418677984731015, 8.880061768457687e-5]
 [-0.012168309195379916, 0.0003679002721679268]
 [-0.02903024314533037, -0.0017796657403380003]
 [-0.017594841603510616, -0.003123852451232536]
 [-0.042044233027366255, 9.676460362403633e-5]
 [-0.03947618577501135, 0.0007834629342420161]
 [-0.053775744337300646, 0.005120733305282458]

In [21]:

plot(summ, fillalpha= 0.2) # plots by default show the median and quantiles of both variables.  Modifying transparency as an example

Out[21]:

Calculate sequence of observables¶

We can use the underlying state-space model to easily simulate states and observables

In [22]:

# Simulate T observations
T = 20

p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC

x_iv = MvNormal(sol.x_ergodic_var) # draw initial conditions from the ergodic distribution
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
sim = solve(problem, DirectIteration())
ϵ = sim.W # store the underlying noise in the simulation

# Collapse to simulated observables as a matrix  - as required by current DifferenceEquations.jl likelihood
# see https://github.com/SciML/DifferenceEquations.jl/issues/55 for direct support of this datastructure
z_rbc = hcat(sim.z...)

Out[22]:

2×21 Matrix{Float64}:
 -0.0367643  -0.0463864  -0.0366824  …  0.0156731  -0.00214294  0.00295132
 -0.172727   -0.257091   -0.311023      0.0203809   0.0373975   0.0446603

Bayesian estimation with Turing¶

The combination of this package and DifferenceEquations enables gradient-based estimation methods such as NUTS/HMC. We will show this with Julia's Turing package, which provides a probabalistic programming lanuage (PPL) in the same spirit as Stan.

For gradients, we use the Zygote.jl reverse-mode auto-differentiation package, which is compatible with both DSSM and DifferenceEquations.

We will consider two likelihoods: a Kalman-Filter where we marginalize out the latent dimensions for linear-gaussian problems, and a joint distribution between the latents and the deep parameters. In both cases, gradients of the underlying likelihoods are provided by the packages.

In [23]:

using Turing
using Turing: @addlogprob!
Turing.setadbackend(:zygote);  # Especially when we sample the latent noise, we will require high-dimensional gradients with reverse-mode AD

First-order, marginal likelihood approach¶

This case samples the deep parameters, calculates the state-space model from the first-order perturbation, then evaluates the likelihood using a Kalman Filter.

In [24]:

# Turing model definition
@model function rbc_kalman(z, m, p_f, settings)
    α ~ Uniform(0.2, 0.8)  # priors
    β ~ Uniform(0.5, 0.99) 
    p_d = (; α, β)
    T = size(z, 2)
    # also passes in the p_f for fixed values
    sol = generate_perturbation(m, p_d, p_f, Val(1); settings) # first-order perturbation
    if !(sol.retcode == :Success) # if the perturbation failed, we want to return -infinity for the likelihood and resample
        @addlogprob! -Inf
        return
    end
    problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z) # utility constructs a LinearStateSpaceProblem from the sol.A, sol.B, etc.
    @addlogprob! solve(problem, KalmanFilter()).logpdf # Linear-Gaussian so can marginalize with kalman filter.  It should automatically choose the KalmanFilter in this case if not provided.
end
cache = SolverCache(model_rbc, Val(1),  [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
z = z_rbc # simulated in previous steps
turing_model = rbc_kalman(z, model_rbc, p_f, settings) # passing observables from before 

n_samples = 1000
n_adapts = 250
δ = 0.65
alg = NUTS(n_adapts,δ)
println("Sampling.  Ignore any 'rejected due to numerical errors' warnings")  # At this point, Turing can't turn off those warnings
chain_1_marginal = sample(turing_model, alg, n_samples; progress = true)

┌ Info: Found initial step size
│   ϵ = 0.025
└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
Sampling:   7%|██▊                                      |  ETA: 0:00:22┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
Sampling: 100%|█████████████████████████████████████████| Time: 0:00:17

Sampling.  Ignore any 'rejected due to numerical errors' warnings

Out[24]:

Chains MCMC chain (1000×14×1 Array{Float64, 3}):

Iterations        = 251:1:1250
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 79.41 seconds
Compute duration  = 79.41 seconds
parameters        = α, β
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   e ⋯
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64     ⋯

           α    0.4844    0.0207     0.0007    0.0010   450.9653    1.0088     ⋯
           β    0.9506    0.0050     0.0002    0.0003   346.2868    1.0066     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.4432    0.4714    0.4840    0.4983    0.5260
           β    0.9416    0.9470    0.9505    0.9537    0.9602

First-order, joint likelihood approach¶

The second case instead draws all of the latents and evaluates the joint likelihood of all. This is slower than the the marginal case of a Kalman filter in cases where everything is linear-Gaussian, but it enables non-gaussian shocks and more flexibility. This approach is used for the 2nd order perturbations in the next case instead of marginalizing with a particle filter.

It is possible that the joint likelihood becomes more competitive with the Kalman filter for large models, but for now consider this a demonstration that sampling the high-dimensional latent variables gives the correct answer and is not unusably slow.

In [25]:

# Turing model definition
@model function rbc_1_joint(z, m, p_f, cache, settings)
    α ~ Uniform(0.2, 0.8)
    β ~ Uniform(0.5, 0.99)
    p_d = (; α, β)
    T = size(z, 2)
    ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0)
    ϵ = reshape(ϵ_draw, m.n_ϵ, T)
    sol = generate_perturbation(m, p_d, p_f, Val(1); cache, settings)
    if !(sol.retcode == :Success)
        @addlogprob! -Inf
        return
    end
    problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)
    @addlogprob! solve(problem, DirectIteration()).logpdf # should choose DirectIteration() by default if not provided
end
cache = SolverCache(model_rbc, Val(1),  [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
z = z_rbc # simulated in previous steps
turing_model = rbc_1_joint(z, model_rbc, p_f, cache, settings) # passing observables from before 

n_samples = 300
n_adapts = 50
δ = 0.65
alg = NUTS(n_adapts,δ)
chain_1_joint = sample(turing_model, alg, n_samples; progress = true)

┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47
┌ Info: Found initial step size
│   ϵ = 0.00625
└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188
Sampling: 100%|█████████████████████████████████████████| Time: 0:04:25

Out[25]:

Chains MCMC chain (300×35×1 Array{Float64, 3}):

Iterations        = 51:1:350
Number of chains  = 1
Samples per chain = 300
Wall duration     = 332.18 seconds
Compute duration  = 332.18 seconds
parameters        = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   e ⋯
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64     ⋯

           α    0.5229    0.0181     0.0010    0.0022    69.6873    0.9972     ⋯
           β    0.9477    0.0051     0.0003    0.0004   209.0060    0.9969     ⋯
   ϵ_draw[1]    3.3707    0.5082     0.0293    0.0637    63.0585    0.9988     ⋯
   ϵ_draw[2]    0.2689    0.2019     0.0117    0.0053   314.7188    0.9967     ⋯
   ϵ_draw[3]    0.7577    0.2222     0.0128    0.0183   145.6377    0.9967     ⋯
   ϵ_draw[4]   -0.3051    0.2072     0.0120    0.0125   279.1289    0.9969     ⋯
   ϵ_draw[5]    0.4963    0.2128     0.0123    0.0124   274.8647    0.9975     ⋯
   ϵ_draw[6]    0.2234    0.2034     0.0117    0.0096   319.6311    1.0020     ⋯
   ϵ_draw[7]    0.7579    0.2149     0.0124    0.0157   185.8446    0.9972     ⋯
   ϵ_draw[8]   -1.0655    0.2382     0.0138    0.0249   116.7922    0.9978     ⋯
   ϵ_draw[9]    0.2374    0.2057     0.0119    0.0154   229.3082    0.9969     ⋯
  ϵ_draw[10]    0.0883    0.1932     0.0112    0.0117   405.8351    0.9968     ⋯
  ϵ_draw[11]    0.9973    0.2473     0.0143    0.0185   173.9337    0.9990     ⋯
  ϵ_draw[12]   -0.3812    0.2214     0.0128    0.0099   364.7857    0.9975     ⋯
  ϵ_draw[13]    0.2082    0.2131     0.0123    0.0099   337.4811    0.9979     ⋯
  ϵ_draw[14]   -0.3477    0.1845     0.0107    0.0106   201.1109    0.9971     ⋯
  ϵ_draw[15]   -1.0082    0.2325     0.0134    0.0222   117.0747    0.9984     ⋯
  ϵ_draw[16]   -1.8721    0.3316     0.0191    0.0370    82.6932    0.9968     ⋯
  ϵ_draw[17]    0.0170    0.1936     0.0112    0.0115   218.6334    0.9971     ⋯
  ϵ_draw[18]   -0.4615    0.2105     0.0122    0.0117   282.9331    0.9967     ⋯
  ϵ_draw[19]   -0.1628    0.1887     0.0109    0.0111   326.3768    0.9970     ⋯
  ϵ_draw[20]   -0.0303    0.2101     0.0121    0.0079   398.9178    0.9977     ⋯
  ϵ_draw[21]   -0.0619    0.9117     0.0526    0.1126    63.0430    0.9974     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.4872    0.5094    0.5236    0.5349    0.5554
           β    0.9374    0.9445    0.9476    0.9510    0.9575
   ϵ_draw[1]    2.4592    2.9460    3.3925    3.7234    4.3613
   ϵ_draw[2]   -0.1099    0.1333    0.2589    0.3936    0.6971
   ϵ_draw[3]    0.3833    0.5889    0.7498    0.9022    1.2460
   ϵ_draw[4]   -0.7119   -0.4320   -0.2715   -0.1825    0.1024
   ϵ_draw[5]    0.0757    0.3503    0.4837    0.6273    0.9723
   ϵ_draw[6]   -0.1415    0.0891    0.2006    0.3422    0.6636
   ϵ_draw[7]    0.3246    0.6242    0.7338    0.8935    1.1931
   ϵ_draw[8]   -1.5653   -1.2191   -1.0338   -0.8900   -0.6837
   ϵ_draw[9]   -0.1407    0.0931    0.2308    0.3688    0.6677
  ϵ_draw[10]   -0.3329   -0.0311    0.0836    0.2166    0.4600
  ϵ_draw[11]    0.5293    0.8480    0.9825    1.1291    1.5695
  ϵ_draw[12]   -0.8768   -0.5111   -0.3606   -0.2229    0.0133
  ϵ_draw[13]   -0.1808    0.0588    0.2068    0.3394    0.6536
  ϵ_draw[14]   -0.7252   -0.4677   -0.3344   -0.2255    0.0101
  ϵ_draw[15]   -1.4794   -1.1560   -0.9809   -0.8436   -0.6249
  ϵ_draw[16]   -2.5968   -2.0884   -1.8550   -1.6155   -1.2956
  ϵ_draw[17]   -0.3888   -0.1104    0.0171    0.1467    0.3788
  ϵ_draw[18]   -0.8963   -0.5928   -0.4580   -0.3050   -0.1086
  ϵ_draw[19]   -0.5132   -0.2866   -0.1720   -0.0235    0.2068
  ϵ_draw[20]   -0.4202   -0.1648   -0.0474    0.1009    0.3734
  ϵ_draw[21]   -1.8476   -0.6901   -0.0752    0.4894    2.0249

Second-order, joint likelihood approach¶

Finally, this calculates a second order perturbation and then samples the latents and deep parameters. Of course, the Kalman Filter would not provide the correct likelihood so the proper comparison is something like a particle filter.

In [26]:

# Turing model definition
@model function rbc_2_joint(z, m, p_f, cache, settings)
    α ~ Uniform(0.2, 0.8)
    β ~ Uniform(0.5, 0.99)
    p_d = (; α, β)
    T = size(z, 2)
    ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0) # add noise to the estimation
    ϵ = reshape(ϵ_draw, m.n_ϵ, T)
    sol = generate_perturbation(m, p_d, p_f, Val(2); cache, settings)
    if !(sol.retcode == :Success)
        @addlogprob! -Inf
        return
    end
    problem = QuadraticStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)
    @addlogprob! solve(problem, DirectIteration()).logpdf
end
cache = SolverCache(model_rbc, Val(2),  [:α, :β])
settings = PerturbationSolverSettings(; print_level = 0)
z = z_rbc # simulated in previous steps
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
turing_model = rbc_2_joint(z, model_rbc, p_f, cache, settings) # passing observables from before 

n_samples = 300
n_adapts = 50
δ = 0.65
alg = NUTS(n_adapts,δ)
chain_2_joint = sample(turing_model, alg, n_samples; progress = true)

┌ Info: Found initial step size
│   ϵ = 0.0125
└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188
Sampling: 100%|█████████████████████████████████████████| Time: 0:03:44

Out[26]:

Chains MCMC chain (300×35×1 Array{Float64, 3}):

Iterations        = 51:1:350
Number of chains  = 1
Samples per chain = 300
Wall duration     = 273.11 seconds
Compute duration  = 273.11 seconds
parameters        = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   e ⋯
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64     ⋯

           α    0.5250    0.0180     0.0010    0.0021    48.1201    1.0333     ⋯
           β    0.9477    0.0057     0.0003    0.0003   809.0493    0.9977     ⋯
   ϵ_draw[1]    3.3547    0.5340     0.0308    0.0701    38.0331    1.0474     ⋯
   ϵ_draw[2]    0.2523    0.1901     0.0110    0.0081   267.7918    1.0010     ⋯
   ϵ_draw[3]    0.7605    0.2202     0.0127    0.0166   138.2808    1.0039     ⋯
   ϵ_draw[4]   -0.3092    0.2105     0.0122    0.0153   212.3421    0.9968     ⋯
   ϵ_draw[5]    0.4908    0.2067     0.0119    0.0143   248.2859    0.9973     ⋯
   ϵ_draw[6]    0.2183    0.2034     0.0117    0.0152   236.9790    1.0027     ⋯
   ϵ_draw[7]    0.7239    0.2318     0.0134    0.0221   125.7547    1.0008     ⋯
   ϵ_draw[8]   -1.0306    0.2295     0.0132    0.0222    92.2911    1.0086     ⋯
   ϵ_draw[9]    0.2083    0.2068     0.0119    0.0103   302.6394    0.9968     ⋯
  ϵ_draw[10]    0.0971    0.1909     0.0110    0.0130   259.3403    1.0017     ⋯
  ϵ_draw[11]    0.9924    0.2447     0.0141    0.0209   147.6565    1.0076     ⋯
  ϵ_draw[12]   -0.3877    0.2087     0.0121    0.0134   271.4525    0.9967     ⋯
  ϵ_draw[13]    0.2118    0.2060     0.0119    0.0142   255.3336    1.0000     ⋯
  ϵ_draw[14]   -0.3364    0.2150     0.0124    0.0112   232.4814    0.9986     ⋯
  ϵ_draw[15]   -1.0031    0.2613     0.0151    0.0257    62.9911    1.0307     ⋯
  ϵ_draw[16]   -1.7959    0.3273     0.0189    0.0314    79.7611    1.0077     ⋯
  ϵ_draw[17]    0.0039    0.1958     0.0113    0.0146   207.4630    1.0076     ⋯
  ϵ_draw[18]   -0.4767    0.1978     0.0114    0.0138   205.4055    0.9997     ⋯
  ϵ_draw[19]   -0.1388    0.2154     0.0124    0.0199   223.3459    0.9970     ⋯
  ϵ_draw[20]   -0.0447    0.2332     0.0135    0.0188   206.5141    1.0035     ⋯
  ϵ_draw[21]    0.0016    0.7230     0.0417    0.0844    43.5613    1.0244     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           α    0.4921    0.5140    0.5252    0.5375    0.5587
           β    0.9349    0.9442    0.9479    0.9513    0.9580
   ϵ_draw[1]    2.4768    2.9778    3.2626    3.6505    4.5892
   ϵ_draw[2]   -0.1108    0.1226    0.2431    0.3856    0.6346
   ϵ_draw[3]    0.3850    0.5965    0.7473    0.9134    1.2168
   ϵ_draw[4]   -0.7323   -0.4438   -0.3015   -0.1730    0.0512
   ϵ_draw[5]    0.0843    0.3569    0.4806    0.6170    0.9349
   ϵ_draw[6]   -0.1683    0.0915    0.2181    0.3569    0.6012
   ϵ_draw[7]    0.3236    0.5608    0.7098    0.8609    1.2712
   ϵ_draw[8]   -1.5092   -1.1558   -1.0130   -0.8786   -0.6073
   ϵ_draw[9]   -0.1816    0.0750    0.2114    0.3291    0.6813
  ϵ_draw[10]   -0.2711   -0.0368    0.0946    0.2235    0.4602
  ϵ_draw[11]    0.5584    0.8224    0.9656    1.1663    1.4695
  ϵ_draw[12]   -0.7866   -0.5384   -0.3778   -0.2229   -0.0195
  ϵ_draw[13]   -0.1534    0.0597    0.2070    0.3587    0.6425
  ϵ_draw[14]   -0.7686   -0.4789   -0.3317   -0.1854    0.0814
  ϵ_draw[15]   -1.5411   -1.1619   -0.9703   -0.8259   -0.5259
  ϵ_draw[16]   -2.5215   -1.9819   -1.7608   -1.5387   -1.2839
  ϵ_draw[17]   -0.3643   -0.1147   -0.0020    0.1181    0.3983
  ϵ_draw[18]   -0.8702   -0.5914   -0.4742   -0.3403   -0.1091
  ϵ_draw[19]   -0.5752   -0.2878   -0.1399    0.0075    0.3161
  ϵ_draw[20]   -0.5372   -0.1825   -0.0365    0.0931    0.4135
  ϵ_draw[21]   -1.4457   -0.5054    0.0329    0.5084    1.3862

Plotting the estimated latent states¶

Below is a graph which shows how well the high-dimensional sampled latents do relative to the pseudo-true latent values.

In [27]:

symbol_to_int(s) = parse(Int, string(s)[9:end-1])
ϵ_chain = sort(chain_1_joint[:, [Symbol("ϵ_draw[$a]") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))
tmp = describe(ϵ_chain)
ϵ_mean = tmp[1][:, 2]
ϵ_std = tmp[1][:, 3]
plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label="Posterior mean", title = "First-Order Joint: Estimated Latents")
plot!(ϵ', label="True values")

Out[27]:

In [28]:

ϵ_chain = sort(chain_2_joint[:, [Symbol("ϵ_draw[$a]") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))
tmp = describe(ϵ_chain)
ϵ_mean = tmp[1][:, 2]
ϵ_std = tmp[1][:, 3]
plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label="Posterior mean", title = "Second-Order Joint: Estimated Latents")
plot!(ϵ', label="True values")

Out[28]: