Goal
Materials
Mandatory
Optional
References
We begin with a motivating example that requires "intelligent" goal-directed decision making: assume that you are an owl and that you're hungry. What are you going to do?
Have a look at Prof. Karl Friston's answer in this video segment by on the cost function for intelligent behavior. (Do watch the video!)
Friston argues that intelligent decision making (behavior, action making) by an agent requires minimization of a functional of beliefs.
Friston further argues (later in the lecture and his papers) that this functional is a (variational) free energy (to be defined below), thus linking decision making to Bayesian inference.
In fact, Friston's Free Energy Principle (FEP) claims that all biological self-organizing processes (including brain processes) can be described as Free Energy minimization in a probabilistic model.
Taking inspiration from FEP, if we want to develop synthetic "intelligent" agents, we have (only) two issues to consider:
What should the agent's model be modeling? This question was (already) answered by Conant and Ashby (1970) as the good regulator theorem: every good regulator of a system must be a model of that system.
From Conant and Ashby's paper (this statement was later finessed by Friston (2013)):
The theory has the interesting corollary that the living brain, insofar as it is successful and efficient as a regulator for survival, must proceed, in learning, by the formation of a model (or models) of its environment."
We will follow the idea that an agent needs to hold a generative model for its environment, which is observed through sensory channels. The environmental dynamics can be affected through actions onto the environment.
Agents that follow the FEP and infer actions by inference in a generative model of the environment are engaged in a process called active inference.
Technically, an active inference-based agent comprises:
Let's draw a diagram to show the interactions between an active inference agent and its environment.
In the model above, the hidden variables $\{z_k\}$ of the agent comprise internal states $\{s_k\}$, control variables $\{u_k\}$ (which are "observed" by the environment as actions $\{a_k\}$), and parameters $\{\theta_k\}$.
In neuroscience/psychology parlance,
We also assume that the agent interacts with an environment, which we represent by a dynamic model
where $a_t$ are actions , $y_t$ are outcomes and $\tilde{s}_t$ holds the environmental states.
In the above equations, $u_t$ and $x_t$ are owned by the agent model, whereas $a_t$ and $y_t$ are variables in the environment model.
The agent can push actions $a_t$ onto the environment and measure responses $y_t$, but has no access to the environmental states $\tilde{s}_t$.
Interactions between the agent and environment are described by
iow, actions are drawn from the posterior over control signals.
Biological agents select their observations by controling their environment. Perception (and learning) serve to improve this data selection process by updating beliefs about the state of the world.
This process begs the question: if a (biological) agent seeks out observations, then which observations is the agent interested in? I.o.w., does the agent have a goal "in mind" when it engages in active data selection?
Yes! Agents set preferences for future observations by setting prior distributions on future observations!
Thus, the generative model for an active inference agent at time $t$ includes variables at future time steps and can be run forward to make predictions (beliefs) about future observations $x_{t+1:T}$.
Note that the generative model includes future time steps.
In order to infer goal-driven (i.e., purposeful) behavior, we now add prior beliefs $\tilde{p}(x)$ about desired future observations, leading to an extended agent model:
$\tilde{p}(x)$ encodes priors beliefs by the agent about future observations.
Goal-directed behavior follows from inference for controls (actions) at $t$, based on expectations (encoded by priors) about future ($>t$) observations.
$\Rightarrow$ Actions fulfill expectations about the future!
using Pkg;Pkg.activate("probprog/workspace");Pkg.instantiate()
IJulia.clear_output();
Here we solve the cart parking problem as stated at the beginning of this lesson. We first specify a generative model for the agent's environment (which is the observed noisy position of the cart) and then constrain future observations by a prior distribution that is located on the target parking spot. Next, we schedule a message passing-based inference algorithm for the next action. This is followed by executing the "Act-execute-observe --> infer --> slide" procedure to infer a sequence of consecutive actions. Finally, the position of the cart over time is plotted. Note that the cart convergees onto the target spot.
using PyPlot, LinearAlgebra, ForneyLab
# Load helper functions. Feel free to explore these
include("ai_agent/environment_1d.jl")
include("ai_agent/helpers_1d.jl")
include("ai_agent/agent_1d.jl")
# Internal model perameters
gamma = 100.0 # Transition precision
phi = 10.0 # Observation precision
upsilon = 1.0 # Control prior variance
sigma = 1.0 # Goal prior variance
T = 10 # Lookahead
# Build internal model
fg = FactorGraph()
o = Vector{Variable}(undef, T) # Observed states
s = Vector{Variable}(undef, T) # internal states
u = Vector{Variable}(undef, T) # Control states
@RV s_t_min ~ GaussianMeanVariance(placeholder(:m_s_t_min),
placeholder(:v_s_t_min)) # Prior state
u_t = placeholder(:u_t)
@RV u[1] ~ GaussianMeanVariance(u_t, tiny)
@RV s[1] ~ GaussianMeanPrecision(s_t_min + u[1], gamma)
@RV o[1] ~ GaussianMeanPrecision(s[1], phi)
placeholder(o[1], :o_t)
s_k_min = s[1]
for k=2:T
@RV u[k] ~ GaussianMeanVariance(0.0, upsilon) # Control prior
@RV s[k] ~ GaussianMeanPrecision(s_k_min + u[k], gamma) # State transition model
@RV o[k] ~ GaussianMeanPrecision(s[k], phi) # Observation model
GaussianMeanVariance(o[k],
placeholder(:m_o, var_id=:m_o_*k, index=k-1),
placeholder(:v_o, var_id=:v_o_*k, index=k-1)) # Goal prior
s_k_min = s[k]
end
# Schedule message passing algorithm
algo = messagePassingAlgorithm(u[2]) # Infer internal states
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code)) # Loads the step!() function for inference
s_0 = 2.0 # Initial State
N = 20 # Total simulation time
(execute, observe) = initializeWorld() # Let there be a world
(infer, act, slide) = initializeAgent() # Let there be an agent
# Step through action-perception loop
u_hat = Vector{Float64}(undef, N) # Actions
o_hat = Vector{Float64}(undef, N) # Observations
for t=1:N
u_hat[t] = act() # Evoke an action from the agent
execute(u_hat[t]) # The action influences hidden external states
o_hat[t] = observe() # Observe the current environmental outcome (update p)
infer(u_hat[t], o_hat[t]) # Infer beliefs from current model state (update q)
slide() # Prepare for next iteration
end
# Plot active inference results
plotTrajectory(u_hat, o_hat)
;
┌ Warning: `vendor()` is deprecated, use `BLAS.get_config()` and inspect the output instead │ caller = npyinitialize() at numpy.jl:67 └ @ PyCall /Users/pol/.julia/packages/PyCall/3fwVL/src/numpy.jl:67
If interested, here is a link to a more detailed version of the 1D parking problem.
We also have a 2D version of this cart parking problem implemented on Raspberry Pi-based robot. (Credits for this implemention to Thijs van de Laar and Burak Ergul).
Consider the agent's inference task at time step $t$, right after having selected an action $a_t$ and having made an observation $y_t$.
As usual, we record actions and observations by substituting the values into the generative model(in the Act-Execute-Observe phase):
Note that (future) $x$ is also a latent variable and hence we include $x$ in the recognition model.
This leads to the following free energy functional
breaks the FE into a complexity term and a term $F_u[q]$ that is conditioned on the policy $u$.
where we used the approximation $q(s|x_{>t},u) \approx p(s|x_{>t},u)$ to illuminate the link to the mutual information.
Minimizing FE leads (approximately) to mutual information maximization between internal states $s$ and observations $x$. In other words, FEM leads to actions that aim to seek out observations that are maximally informative about the hidden causes of these observations.
Ambiguous states have uncertain mappings to observations. Minimizing FE leads to actions that try to avoid ambiguous states.
In short, if the generative model includes variables that represent (yet) unobserved future observations, then action selection by FEM leads to a very sophisticated behavioral strategy that is maximally consistent with
All these imperatives are simultaneously represented and automatically balanced against each other in a single time-varying cost function (Free Energy) that needs no tuning parameters.
(Just to be sure, you don't need to memorize these derivations nor are you expected to derive them on-the-spot. We present these decompositions only to provide insight into the multitude of forces that underlie FEM-based action selection.)