#!/usr/bin/env python
# coding: utf-8

# (ex_system_internals)=
# 
# # Supporting large computational graphs
# 
# ## Reference semantics and shared (sub)expressions
# 
# The first and most important thing to understand about heyoka.py's expressions is that they implement so-called *reference semantics*. That is, an expression is essentially a handle to an underlying object, and copying the expression will not perform an actual copy, rather it will return a new reference to the same underlying object.
# 
# Before you ask "isn't this how all Python objects work?", let me immediately point out that heyoka.py's expressions are exposed from C++ and that reference semantics is implemented all the way down into the C++ layer. As a concrete example of what this means, consider the following simple expression:

# In[1]:


import heyoka as hy

# Create a couple of variables.
x, y = hy.make_vars("x", "y")

# Create a simple expression.
ex = x + y


# If we attempt to copy ``ex`` via the standard {func}`~copy.copy()` function, we will get nominally a new Python object, as we can see by querying the {func}`id()`:

# In[2]:


from copy import copy

# Make a "copy" of ex.
ex_copy = copy(ex)

# Print the ids.
print(f'Original id: {id(ex)}')
print(f'Copy id    : {id(ex_copy)}')


# However, both ``ex`` and ``ex_copy`` are in reality pointing to the **same** underlying C++ object which is shared among the two Python objects.
# 
# We can use ``ex`` as a building block to create more complicated expressions, e.g.:

# In[3]:


a = hy.sin(ex) + hy.cos(ex)
a


# Because of the use of reference semantics, this expression will not contain two separate copies of $x + y$. Rather, it will contain two *references* to the original expression ``ex``.
# 
# If, on the other hand, we do **not** re-use ``ex`` and write instead

# In[4]:


b = hy.sin(x + y) + hy.cos(x + y)
b


# we get an expression ``b`` which is mathematically equivalent to ``a`` but which contains two separate copies of $x + y$, rather than two references to ``ex``. This leads to a couple of very important consequences.
# 
# First of all, the memory footprint of ``b`` will be larger than ``a``'s because it is (wastefully) storing two copies of the same subexpression $x + y$ (rather than storing two references to the same underlying expression).
# 
# Secondly, heyoka.py's symbolic manipulation routines are coded to keep track of shared subexpressions with the goal of avoiding redundant computations. For instance, let us say we want to replace $x$ with $x^2 - 1$ in the expression ``a`` via
# the {func}`~heyoka.subs()` function:

# In[5]:


hy.subs(a, {x: x**2 - 1.})


# In order to perform the substitution, the {func}`~heyoka.subs()` function needs to traverse the expression tree of ``a``. When it encounters for the **first** time the ``ex`` subexpression, it will:
# 
# 1. perform the substitution, producing as a result $x^2-1+y$,
# 2. record in an internal bookkeeping structure that performing the substitution on the subexpression ``ex`` produced the result $x^2-1+y$.
# 
# Crucially, the **second** time ``ex`` is encountered during the traversal of the expression tree, the {func}`~heyoka.subs()` function will query the bookkeeping structure and detect that the result of the substitution on ``ex`` has already been computed, and it will fetch the cached result of the substitution instead of (wastefully) perform again the same computation. Thus, not only we avoided a redundant calculation, but also the two $x^2-1+y$ subexpressions appearing in the final result are pointing to the same underlying object (rather than being two separate copies of identical subexpressions).
# 
# On the other hand, when we apply the same substitution on ``b`` we get:

# In[6]:


hy.subs(b, {x: x**2 - 1.})


# That is, the result is mathematically identical (obviously), but, because there is no internal subexpression sharing, the substitution $x \rightarrow x^2 - 1$ had to be performed twice (rather than once) and the two $x^2-1+y$ subexpressions appearing in the final result are two separate copies of identical subexpressions.
# 
# As a final piece of information, it is important to emphasise how subexpression sharing is not limited to single expressions, but it also happens across multiple expressions. For instance, consider the following vector expression consisting of the two components $\sin\left( x + y \right) + \cos\left( x + y \right)$ and $1 + \mathrm{e}^{x+y}$:

# In[7]:


vec_ex = [hy.sin(ex) + hy.cos(ex), 1. + hy.exp(ex)]
vec_ex


# Here the subexpression ``ex`` is shared among the two components of ``vec_ex``, which both contain references to ``ex`` (rather than storing their own separate copies of ``ex``). When we invoke the {func}`~heyoka.subs()` function on ``vec_ex``, the internal bookkeeping structure will be initialised during the traversal of the first component of ``vec_ex``, but it will **also** persist for the traversal of the second component of ``vec_ex``:

# In[8]:


hy.subs(vec_ex, {x: x**2 - 1.})


# Thus, the second component of the result of the substitution contains a reference to the expression $x^2-1+y$ which is shared with the first component of the result.
# 
# This is a point which is very important and worth repeating: the fact that heyoka.py's symbolic manipulation functions (such as {func}`~heyoka.subs()`) can accept in input both scalar and vector expressions is not only a convenience that reduces typing, it is also crucial in order to ensure maximal subexpression sharing in the result. If you can, you should **always** use a symbolic manipulation function directly on a vector expression, rather than operating on one component at a time.
# 
# ```python
# # Every time you do this, a kitten dies.
# # Please **don't** do this.
# >>> res = [hy.subs(ex, {x: x**2 - 1.}) for ex in vec_ex]
# 
# # Do this instead.
# >>> res = hy.subs(vec_ex, {x: x**2 - 1.})
# 
# # Also, **don't** do this.
# >>> r = s[:3]
# >>> v = s[3:]
# >>> new_r = subs(r, {x: x**2 - 1.})
# >>> new_v = subs(v, {x: x**2 - 1.})
# 
# # Do this instead.
# >>> new_s = subs(s, {x: x**2 - 1.})
# >>> new_r = new_s[:3]
# >>> new_v = new_s[3:]
# ```

# ### Consequences for large computational graphs
# 
# These details are, most of the time, of little consequence and they may just result in small, hardly-detectable inefficiencies. The situation however changes when creating large computational graphs, especially if they are constructed by iteratively feeding expressions as arguments to other expressions.
# 
# As a motivating example, we can consider the computational graphs of [feed-forward neural networks (FFNN)](./ffnn.ipynb). In an FFNN, the result of the inference in each layer is fed to the next layer, giving rise to a computational graph that explodes in exponentail fashion as one increases the number of layers. We can begin with a tiny network  consisting of two inputs, no hidden layers and two outputs:

# In[9]:


hy.model.ffnn(inputs = [x, y], nn_hidden = [], n_out = 2, activations = [hy.tanh])


# As we increase the number of layers, we can see that the computational graphs quickly grow more complicated:

# In[10]:


hy.model.ffnn(inputs = [x, y], nn_hidden = [2], n_out = 2,
              activations = [hy.tanh, hy.tanh])


# In[11]:


hy.model.ffnn(inputs = [x, y], nn_hidden = [2, 2], n_out = 2,
              activations = [hy.tanh, hy.tanh, hy.tanh])


# In the last example we can recognise subexpressions that occurr multiple time in the computational graphs, such as ``(p2 * x) + (p3 * y) + p13``. These are the outputs of one layer being fed to the next one.
# 
# We can quantify the growth in complexity by looking at how the size (i.e., the number of nodes) of the computational graphs evolves with the number of layers:

# In[12]:


for n_hlayers in range(0, 6):
    ffnn = hy.model.ffnn(inputs = [x, y], nn_hidden = [2] * n_hlayers, n_out = 2,
                         activations = [hy.tanh] * (n_hlayers + 1))
    print(f'Size of the outputs for {n_hlayers} hidden layers: {[len(_) for _ in ffnn]}')


# The size of the outputs roughly doubles every time we add a new layer. Let us see what happens with 32 hidden layers:

# In[13]:


n_hlayers = 32
ffnn = hy.model.ffnn(inputs = [x, y], nn_hidden = [2] * n_hlayers, n_out = 2,
                     activations = [hy.tanh] * (n_hlayers + 1))
print(f'Size of the outputs for {n_hlayers} hidden layers: {[len(_) for _ in ffnn]}')


# The graphs have now exploded to a size of $\sim 10^{10}$ -- yet they were created almost instantly and, if you check, you will see that their memory footprint isn't anywhere near 10GB.
# 
# This efficiency is due to the fact that the FFNN constructor is careful to compute the output for each neuron only once and then re-use it in the computation of the output of the next layer, so that subexpression sharing is maximised. Because of this high degree of subexpression sharing, operations on FFNNs are efficient despite the large size of the computational graphs.
# 
# For instance, if we want to perform inference, we can [compile an FFNN](./compiled_functions.ipynb). Subexpression sharing allows for an efficient process of [common subexpression elimination (CSE)](https://en.wikipedia.org/wiki/Common_subexpression_elimination), and we can show that, as expected, the computational complexity of inference end up scaling linearly with the number of layers:

# In[15]:


for n_hlayers in range(0, 10):
    ffnn = hy.model.ffnn(inputs = [x, y], nn_hidden = [2] * n_hlayers, n_out = 2,
                         activations = [hy.tanh] * (n_hlayers + 1))
    # Create a compiled function from the FFNN.
    cf = hy.cfunc(ffnn, [x, y])
    print(f'Size of the decomposition for {n_hlayers} hidden layers: {len(cf.dc) - 4}')


# As a reminder, the decomposition of a compiled function is the sequence of elementary operations the original function has been broken into.