This notebook is part of the supplementary material for:

Genewein T., Leibfried F., Grau-Moya J., Braun D.A. (2015) *Bounded rationality, abstraction and hierarchical decision-making: an information-theoretic optimality principle*, Frontiers in Robotics and AI.

More information on how to run the notebook on the accompanying github repsitory where you can also find updated versions of the code and notebooks.

This notebook corresponds to Sections 2.2 - 2.4 and reproduces Figures 2, 3 and 4 of the paper.

In notebook 1, corresponding to Section 2.1 of the paper, we introduced a free energy framework for bounded rational decision making. Essentially we formalized a variational problem where we searched for a desired behavior $p^*(a|w)$ that maximizes expected utility while at the same time keeping computational cost low. The parameter $\beta$ (inverse temperature) governs the trade-off. Crucially, we only considered a single environment (or context) $w$.

Now, we will extend our problem to find bounded optimal behavior, for many different environments. Additionally, we also search for the best prior over actions (averaged over all environments). Formally, we extend the original variational problem by taking the average over environments and the arg max over the prior as well

$$\underset{p_0(a)}{\operatorname{arg~max}}~\sum_w p(w)~\left[\underset{p(a|w)}{\operatorname{arg~max}}~\mathbf{E}_{p(a|w)}[U(w,a)] - \frac{1}{\beta} D_{\mathrm{KL}}(p(a|w)||p_0(a))\right].$$The problem can be rewritten as $$\underset{p(a|w)}{\operatorname{arg~max}}\sum_{w,a}p(w,a)U(w,a) - \frac{1}{\beta}I(W;A),$$ which is mathematically identical to the optimization in rate-distortion theory (the information-theoretic framework for lossy compression). $I(W;A)$ denotes the mutual information between the random variables $W$ and $A$ and is computed as the average KL-divergence $I(W;A)=\sum_w p(w) \underbrace{\sum_a p(a|w) \log \frac{p(a|w)}{p(a)}}_{D_{\mathrm{KL}(p(a|w)||p(a)}}$

The problem has an analytical solution in the form of a set of self-consistent equations

$$\begin{align} p^*(a|w) &= \frac{1}{Z(w)}p(a) \exp(\beta U(w,a))\\ p(a) &= \sum_w p(a|w) p(w) \end{align}$$with the partition sum $Z(w) = \sum_a p(a) \exp(\beta U(w,a))$

Similar to the free-energy case, the inverse temperature plays the role of a resource-parameter and governs the trade-off between large expected utility and low computational effort, measured by the mutual information term. For low $\beta$ and thus low computational resources, the average KL-divergence between the specific behavior $p^*(a|w)$ and the marginal (or average) behavior $p(a)$ must be small. This can only be achieved by applying the same or a similar policy to different $w$, which leads to behavioral abstractions where different situations elicit the same response. On the other hand, for large $\beta$ and thus large computational resources, specific behavior can deviate substantially from the average behavior, allowing for very different behavior for different $w$.

The goal is to design a recommender system that observes an item bought $w$ and then recommends another item $a$. In this example the system can either recommend another concrete item or the best-selling item of a certain category or the best-selling item of a super-category which subsumes several categories. Recommending the correct concrete item will lead to the highest chances of buying and thus to the highest expected utility. Recommending the best-selling item of the corresponding category or super-category leads to a non-zero utility as well (with a lower value, but the same recommendation is applicable to multiple bought items). Below is a table of the super-categories and categories for each item - the corresponding item that has the highest chance of being bought when recommending is indicated with the arrow `->`

- Electric devices and electronics
- Computers
- Laptop -> laptop-sleeve
- Monitor -> monitor cable
- Game pad -> video game

- Small appliances
- Coffee machine -> coffee capsules
- Vaccuum cleaner -> vaccuum cleaner bags
- Electric toothbrush -> brush heads

- Computers
- Food and Cooking
- Fruit
- Grapes -> cheese
- Strawberries -> cream
- Limes -> cane sugar

- Baking
- Pancake mix -> maple syrup
- Baking soda -> vinegar
- Baker's yeast -> flour
- Muffin cups -> chocolate chips, flour

- Fruit

The example is summarized in the plot of the utility function $U(a,w)$ below.

...in principle you can use any of the examples presented in the paper (also from Section 3 or 4) by loading it from the correct `.jl`

file. However, you might want to change axis-labeling etc. accordingly which requires a bit of work.

In [8]:

```
#only run this once
include("RateDistortionDecisionMaking.jl")
```

Out[8]:

In [2]:

```
using RateDistortionDecisionMaking, Gadfly
#make the default plot size a bit larger
set_default_plot_size(15cm, 12cm)
#set up taxonomy example --- use other examples by including corresponding file here
include("TaxonomyExample.jl")
w_vec, w_strings, a_vec, a_strings, p_w, U = setuptaxonomy()
#pre-compute utilities, find maxima
U_pre, Umax = setuputilityarrays(a_vec, w_vec,U)
#visualize utility
plt_utility = visualizeMatrix(U_pre, w_vec, a_vec, w_strings, a_strings, xlabel="Item bought w",
ylabel="Recommend a", legendlabel="U(a,w)")
```

Out[2]:

Set the inverse temperature below to a value $\geq 0$. The inverse temperature governs the computational resources (because it sets the price of information processing that is traded off against the expected utility). Low values of $\beta$ correspond to low computational resources and will lead to quite abstract solutions (in the sense that the same policy is used on many different $w$). High values of $\beta$ will in contrast lead to very $w$-specific actions as there is little cost for adapting to the different world-states $w$.

The parameter you set here will be used in the code-cells below, so when changing it make sure to re-run all the cells.

In [3]:

```
β = 1.3 #inverse temperature
```

Out[3]:

The solutions $p^*(a|w)$ and $p(a)$ can be obtained by initializing randomly and then iterating the equations until convergence. This is known as the Blahut-Arimoto algorithm. It is guaranteed to converge to the global solution of the rate distortion variational problem.

In [4]:

```
ε = 0.00001 #convergence critetion for BAiterations
maxiter = 10000 #maximum number of BA iterations
#initialize p(a) uniformly
num_acts = length(a_vec)
pa_init = ones(num_acts)/num_acts
#Blahut-Arimotot iterations
p_agw, p_a, performance = BAiterations(pa_init, β, U_pre, p_w, ε, maxiter, compute_performance=true,
performance_as_dataframe=true, performance_per_iteration=false)
#visualize solution
#suppress immediate plotting since the stacked plots look bad, rather use display() to show each plot separately
plot_marg, plot_cond = visualizeBAsolution(p_a, p_agw, a_vec, w_vec, a_strings, w_strings,
wlabel="Item bought w", alabel="Recommend a",
legendlabel_marginal="p(a)", legendlabel_conditional="p*(a|w)",suppress_vis=true);
display(plot_marg)
display(plot_cond)
```

Try changing the value of $\beta$ in the code above and re-run the code-cells to see how the bounded-optimal solution changes. You can notice that over a certain range of $\beta$ values, the solution does not seem to change much but then suddenly a small change in $\beta$ leads to a drastic change in the solution. These phase-transitions are a result of the sharp structure of the utility function - we will take a closer look below.

Note that the code below might take a few minutes to run - you can speed it up by using a lower-resolution $\beta$-grid: `β_sweep`

We will now sweep over a whole range of $\beta$-values and record the corresponding expected utility and information terms (as well as the value of the objective function that trades-off utility and information processing cost). We can then use this to plot the terms as a function of $\beta$ and make the phase-transitions visible.

In [5]:

```
#β-sweep
ε = 0.00001 #convergence critetion for BAiterations
maxiter = 20000
β_sweep = collect(0.01:0.005:3)
#β_sweep = collect(0.1:0.05:2, 2:0.5:15)
nβ = length(β_sweep)
#preallocate
I = zeros(nβ)
Ha = zeros(nβ)
Hagw = zeros(nβ)
EU = zeros(nβ)
RDobj = zeros(nβ)
#sweep over β values
for i=1:nβ
pagw, pa, I[i], Ha[i], Hagw[i], EU[i], RDobj[i] = BAiterations(pa_init, β_sweep[i], U_pre, p_w, ε, maxiter,compute_performance=true)
end
#plot evolution of performance measures
plot_perf_entropy, plot_perf_util, plot_rateutility = plotperformancemeasures(I, Ha, Hagw, EU, RDobj, β_sweep);
```

In the first two panels we see how the information processing effort $I(W;A)=H(A)-H(A|W)$ evolves with increasing $\beta$. Importantly, we see sharp kinks on all curves, this is where the phase-transitions occur. We see similar kinks in the second panel that shows the expected utility and the value of the objective function. Each kink corresponds to a different level of abstraction (to investigate set $\beta$ in the first code-block after the utility function and run the subsequent two code-cells). The third panel shows the rate-utility curve (in analogy to the rate-distortion curve from information theory). It shows the required computational capacity to achieve a certain utility and vice versa (the maxmally achievable utility, given certain information processing capabilities). The shaded region indicates theoretically impossible systems that cannot be achieved, regardless of the implementation. The white region denotes suboptimal systems that do not use their computational capacity maximally. The line indicates optimal systems (that solve the rate-distortion variational problem as given in the equations above).

In [6]:

```
#create a plot where the entropic performance plot is shown and on top of it several solutions of p(a|w)
#for different tempereatures (ideally lying in the plateaus)
β_plts = [0.05, 0.5, 0.9, 1.1, 1.3, 1.5, 2]
nβ = length(β_plts)
#preallocate
I = zeros(nβ)
Ha = zeros(nβ)
Hagw = zeros(nβ)
EU = zeros(nβ)
RDobj = zeros(nβ)
plots = Array(Plot,nβ)
for i=1:nβ
pagw, pa, I[i], Ha[i], Hagw[i], EU[i], RDobj[i] = BAiterations(pa_init, β_plts[i], U_pre, p_w, ε, maxiter,compute_performance=true)
#except for the last plot, provide an optional argument that will be passed on to the underlying Gadfly Theme()
#to suppress drawing of the colorkey (legend)
β_val = β_plts[i]
if i==1
pcond = visualizeBAconditional(pagw, a_vec, w_vec, wlabel="Item bought w β=$β_val", alabel="Recommend a",
legendlabel="", key_position = :none, minor_label_font_size = 0pt, major_label_font_size = 9pt,
bar_spacing = 0pt)
elseif i<nβ
pcond = visualizeBAconditional(pagw, a_vec, w_vec, wlabel="β=$β_val", alabel="",
legendlabel="", key_position = :none, minor_label_font_size = 0pt, major_label_font_size = 9pt,
bar_spacing = 0pt)
else
pcond = visualizeBAconditional(pagw, a_vec, w_vec, wlabel="β=$β_val", alabel="",
legendlabel="p*(a|w)", minor_label_font_size = 0pt, major_label_font_size = 9pt,
key_title_font_size = 9pt, key_label_font_size = 8pt, bar_spacing = 0pt )
end
#display(pcond)
plots[i] = pcond
end
#compose final plot
plot_evolution = vstack(hstack(plots...),plot_perf_entropy)
display(plot_evolution)
```