Protein binding & undfolding – a four-state model

In this notebook we will look into a the kinetics of a model system describing competing protein folding, aggregation and ligand binding. Using ChemPy we can define thermodynamic and kinetic parameters, and obtain a representation of a system of ODEs which may be integrated efficiently. Since we use SymPy we can also generate publication quality latex-expressions of our mathematical model directly from our source code. No need to write the equations multiple times in Python/Latex (or even C++ if the integration is to be performed a large number of times such as during parameter estimation).

First we will perform our imports:

In [ ]:
import logging; logger = logging.getLogger('matplotlib'); logger.setLevel(logging.INFO)  # or notebook filled with logging

from collections import OrderedDict, defaultdict
import math
import re
import time
from IPython.display import Image, Latex, display
import matplotlib.pyplot as plt
import sympy
from pyodesys.symbolic import ScaledSys
from pyodesys.native.cvode import NativeCvodeSys
from chempy import Substance, Equilibrium, Reaction, ReactionSystem
from chempy.kinetics.ode import get_odesys
from chempy.kinetics.rates import MassAction
from chempy.printing.tables import UnimolecularTable, BimolecularTable
from chempy.thermodynamics.expressions import EqExpr
from chempy.util.graph import rsys2graph
from chempy.util.pyutil import defaultkeydict
%matplotlib inline

Next we will define our substances. Note how we specify the composition, this will allow ChemPy to raise an error if any of our reactions we enter later would violate mass-conservation. It will also allow us to reduce the number of unknowns in our ODE-system by using the linear invariants from the mass-conservation.

In [ ]:
substances = OrderedDict([
    ('N', Substance('N', composition={'protein': 1}, latex_name='[N]')),
    ('U', Substance('U', composition={'protein': 1}, latex_name='[U]')),
    ('A', Substance('A', composition={'protein': 1}, latex_name='[A]')),
    ('L', Substance('L', composition={'ligand': 1}, latex_name='[L]')),
    ('NL', Substance('NL', composition={'protein': 1, 'ligand': 1}, latex_name='[NL]')),
])

We will model thermodynamic properties using enthalpy (H), entropy (S) and heat capacity (Cp). Kinetic paramaters (rate constants) are assumed to follow the Eyring equation:

In [ ]:
def _gibbs(args, T, R, backend, **kwargs):
    H, S, Cp, Tref = args
    H2 = H + Cp*(T - Tref)
    S2 = S + Cp*backend.log(T/Tref)
    return backend.exp(-(H2 - T*S2)/(R*T))

def _eyring(args, T, R, k_B, h, backend, **kwargs):
    H, S = args
    return k_B/h*T*backend.exp(-(H - T*S)/(R*T))
In [ ]:
Gibbs = EqExpr.from_callback(_gibbs, parameter_keys=('temperature', 'R'), argument_names=('H', 'S', 'Cp', 'Tref'))
Eyring = MassAction.from_callback(_eyring, parameter_keys=('temperature', 'R', 'k_B', 'h'), argument_names=('H', 'S'))

Next we define our free parameters:

In [ ]:
thermo_dis = Gibbs(unique_keys=('He_dis', 'Se_dis', 'Cp_dis', 'Tref_dis'))
thermo_u = Gibbs(unique_keys=('He_u', 'Se_u', 'Cp_u', 'Tref_u'))  # ([He_u_R, Se_u_R, Cp_u_R, Tref])
kinetics_agg = Eyring(unique_keys=('Ha_agg', 'Sa_agg'))  # EyringMassAction([Ha_agg, Sa_agg])
kinetics_as = Eyring(unique_keys=('Ha_as', 'Sa_as'))
kinetics_f = Eyring(unique_keys=('Ha_f', 'Sa_f'))

We will have two reversible reactions, and one irreversible reaction:

In [ ]:
eq_dis = Equilibrium({'NL'}, {'N', 'L'}, thermo_dis, name='ligand-protein dissociation')
eq_u = Equilibrium({'N'}, {'U'}, thermo_u, {'L'}, {'L'}, name='protein unfolding')
r_agg = Reaction({'U'}, {'A'}, kinetics_agg, {'L'}, {'L'}, name='protein aggregation')

We formulate a system of 5 reactions honoring our reversible equilibria and our irreversible reaction:

In [ ]:
rsys = ReactionSystem(
    eq_dis.as_reactions(kb=kinetics_as, new_name='ligand-protein association') +
    eq_u.as_reactions(kb=kinetics_f, new_name='protein folding') +
    (r_agg,), substances, name='4-state CETSA system')

We can query the ReactionSystem instance for what substances contain what components:

In [ ]:
vecs, comp = rsys.composition_balance_vectors()
names = rsys.substance_names()
dict(zip(comp, [dict(zip(names, v)) for v in vecs]))

We can look at our ReactionSystem as a graph if we wish:

In [ ]:
rsys2graph(rsys, '4state.png', save='.', include_inactive=False)
Image('4state.png')

...or as a Table if that suits us better (note that "A" ha green highlighting, denoting it's a terminal product)

In [ ]:
rsys

Try hovering over the names to have them highlighted (this is particularly useful when working with large reaction sets).

We ca also generate tables representing the unimolecular reactions involing each substance, or the matrix showing the bimolecular reactions:

In [ ]:
uni, not_uni = UnimolecularTable.from_ReactionSystem(rsys)
bi, not_bi = BimolecularTable.from_ReactionSystem(rsys)
assert not (not_bi & not_uni), "Only uni- & bi-molecular reactions expected"
In [ ]:
uni
In [ ]:
bi

Exporting expressions as LaTeX is quite straightforward:

In [ ]:
def pretty_replace(s, subs=None):
    if subs is None:
        subs = {
            'Ha_(\w+)': r'\\Delta_{\1}H^{\\neq}',
            'Sa_(\w+)': r'\\Delta_{\1}S^{\\neq}',
            'He_(\w+)': r'\\Delta_{\1}H^\\circ',
            'Se_(\w+)': r'\\Delta_{\1}S^\\circ',
            'Cp_(\w+)': r'\\Delta_{\1}\,C_p',
            'Tref_(\w+)': r'T^{\\circ}_{\1}',
        }
    for pattern, repl in subs.items():
        s = re.sub(pattern, repl, s)
    return s

def mk_Symbol(key):
    if key in substances:
        arg = substances[key].latex_name
    else:
        arg = pretty_replace(key.replace('temperature', 'T'))

    return sympy.Symbol(arg)

autosymbols = defaultkeydict(mk_Symbol)
rnames = {}
for rxn in rsys.rxns:
    rnames[rxn.name] = rxn.name.replace(' ', '~').replace('-','-')
    rate_expr_str = sympy.latex(rxn.rate_expr()(autosymbols, backend=sympy, reaction=rxn))
    lstr = r'$r(\mathrm{%s}) = %s$' % (rnames[rxn.name], rate_expr_str)
    display(Latex(lstr))
In [ ]:
ratexs = [autosymbols['r(\mathrm{%s})' % rnames[rxn.name]] for rxn in rsys.rxns]
rates = rsys.rates(autosymbols, backend=sympy, ratexs=ratexs)
for k, v in rates.items():
    display(Latex(r'$\frac{[%s]}{dt} = %s$' % (k, sympy.latex(v))))
In [ ]:
default_c0 = defaultdict(float, {'N': 1e-9, 'L': 1e-8})

params = dict(
    R=8.314472,  # or N_A & k_B
    k_B=1.3806504e-23,
    h=6.62606896e-34,  # k_B/h == 2.083664399411865e10 K**-1 * s**-1
    He_dis=-45e3,
    Se_dis=-400,
    Cp_dis=1.78e3,
    Tref_dis=298.15,
    He_u=60e3,
    Cp_u=20.5e3,
    Tref_u=298.15,
    Ha_agg=106e3,
    Sa_agg=70,
    Ha_as=4e3,
    Sa_as=-10,
    Ha_f=90e3,
    Sa_f=50,
    temperature=50 + 273.15
)

We have the melting temperature $T_m$ as a free parameter, however, the model is expressed in terms of $\Delta_u S ^\circ$ so will need to derive the latter from the former: $$ \begin{cases} \Delta G = 0 \\ \Delta G = \Delta H - T_m\Delta_u S \end{cases} $$ $$ \begin{cases} \Delta H = \Delta H^\circ + \Delta C_p \left( T_m - T^\circ \right) \\ \Delta S = \Delta S^\circ + \Delta C_p \ln\left( \frac{T_m}{T^\circ} \right) \end{cases} $$ this gives us the following equation: $$ \Delta H^\circ + \Delta C_p \left( T_m - T^\circ \right) = T_m \left( \Delta S^\circ + \Delta C_p \ln\left( \frac{T_m}{T^\circ} \right) \right) $$ Solving for $\Delta S^\circ$: $$ \Delta S^\circ = T_m^{-1}\left( \Delta H^\circ + \Delta C_p \left( T_m - T^\circ \right) \right) - \Delta C_p \ln\left( \frac{T_m}{T^\circ} \right) $$

In [ ]:
def Se0_from_Tm(Tm, token):
    dH0, T0, dCp = params['He_'+token], params['Tref_'+token], params['Cp_'+token]
    return dH0/Tm + (Tm-T0)*dCp/Tm - dCp*math.log(Tm/T0)
params['Se_u'] = Se0_from_Tm(48.2+273.15, 'u')
params['Se_u']

If we want to see the numerical values for the rate of the individual reactions it is quite easy:

In [ ]:
params_c0 = default_c0.copy()
params_c0.update(params)
for rxn in rsys.rxns:
    print('%s: %.5g' % (rxn.name, rxn.rate_expr()(params_c0, reaction=rxn)))

By using pyodesys we can generate a system of ordinary differential equations:

In [ ]:
odesys, extra = get_odesys(rsys, include_params=False, SymbolicSys=ScaledSys, dep_scaling=1e9)
len(odesys.exprs)  # how many (symbolic) expressions are there in this representation?

Numerical integration of ODE systems require a guess for the inital step-size. We can derive an upper bound for an "Euler-forward step" from initial concentrations and restrictions on mass-conservation:

In [ ]:
h0max = extra['max_euler_step_cb'](0, default_c0, params)
h0max

Now let's put our ODE-system to work:

In [ ]:
def integrate_and_plot(system, c0=None, first_step=None, t0=0, stiffness=False, nsteps=9000, **kwargs):
    if c0 is None:
        c0 = default_c0
    if first_step is None:
        first_step = h0max*1e-11
    tend = 3600*24
    t_py = time.time()
    
    kwargs['atol'] = kwargs.get('atol', 1e-11)
    kwargs['rtol'] = kwargs.get('rtol', 1e-11)
    res = system.integrate([t0, tend], c0, params, integrator='cvode', nsteps=nsteps,
                           first_step=first_step, **kwargs)
    t_py = time.time() - t_py
    if stiffness:
        plt.subplot(1, 2, 1)
    _ = system.plot_result(xscale='log', yscale='log')
    _ = plt.legend(loc='best')
    plt.gca().set_ylim([1e-16, 1e-7])
    plt.gca().set_xlim([1e-11, tend])
    
    if stiffness:
        if stiffness is True:
            stiffness = 0
        ratios = odesys.stiffness()
        plt.subplot(1, 2, 2)
        plt.yscale('linear')
        plt.plot(odesys._internal[0][stiffness:], ratios[stiffness:])
    for k in ('time_wall', 'time_cpu'):
        print('%s = %.3g' % (k, res[2][k]), end=', ')
    print('time_python = %.3g' % t_py)
    return res

_, _, info = integrate_and_plot(odesys)
assert info['internal_yout'].shape[1] == 5
{k: v for k, v in info.items() if not k.startswith('internal')}

pyodesys even allows us to generate C++ code which is compiled to a fast native extension module:

In [ ]:
native = NativeCvodeSys.from_other(odesys, first_step_expr=0*odesys.indep)
In [ ]:
_, _, info_native = integrate_and_plot(native)
{k: v for k, v in info_native.items() if not k.startswith('internal')}

Note how much smaller "time_cpu" was here

In [ ]:
info['time_wall']/info_native['time_wall']
In [ ]:
from chempy.kinetics._native import get_native
In [ ]:
native2 = get_native(rsys, odesys, 'cvode')
In [ ]:
_, _, info_native2 = integrate_and_plot(native2, first_step=0.0)
{k: v for k, v in info_native2.items() if not k.startswith('internal')}

We have one complication, due to linear dependencies in our formulation of the system of ODEs our jacobian is singular:

In [ ]:
cses, (jac_in_cse,) = odesys.be.cse(odesys.get_jac())
jac_in_cse
In [ ]:
odesys.jacobian_singular()

Since implicit methods (which are required for stiff cases often encountered in kinetic modelling) uses the Jacboian (or rather I - γJ) in the modified Newton's method we may get failures during integration (depending on step size and scaling). What we can do is to identify linear dependencies based on composition of the materials and exploit the invariants to reduce the dimensionality of the system of ODEs:

In [ ]:
A, comp_names = rsys.composition_balance_vectors()
A, comp_names, list(rsys.substances.keys())

That made sense: two different components can give us (up to) two linear invariants.

Let's look what those invariants looks like symbolically:

In [ ]:
y0 = {odesys[k]: sympy.Symbol(k+'0') for k in rsys.substances.keys()}
analytic_L_N = extra['linear_dependencies'](['L', 'N'])
analytic_L_N(None, y0, None, sympy)
assert len(analytic_L_N(None, y0, None, sympy)) > 0  # ensure the callback is idempotent
analytic_L_N(None, y0, None, sympy), list(enumerate(odesys.names))

one can appreciate that one does not need to enter such expressions manually (at least for larger systems). That is both tedious and error prone.

Let's see how we can use pyodesys to leverage this information on redundancy:

In [ ]:
from pyodesys.symbolic import PartiallySolvedSystem
no_invar = dict(linear_invariants=None, linear_invariant_names=None)
psysLN = PartiallySolvedSystem(odesys, analytic_L_N, **no_invar)
print(psysLN.be.cse(psysLN.get_jac())[1][0])
psysLN['L'], psysLN.jacobian_singular(), len(psysLN.exprs)

above we chose to get rid of 'L' and 'N', but we could also have removed 'A' instead of 'N':

In [ ]:
psysLA = PartiallySolvedSystem(odesys, extra['linear_dependencies'](['L', 'A']), **no_invar)
print(psysLA.be.cse(psysLA.get_jac())[1][0])
psysLA['L'], psysLA.jacobian_singular()
In [ ]:
plt.figure(figsize=(12,4))
plt.subplot(1, 2, 1)
_, _, info_LN = integrate_and_plot(psysLN, first_step=0.0)
assert info_LN['internal_yout'].shape[1] == 3

plt.subplot(1, 2, 2)
_, _, info_LA = integrate_and_plot(psysLA, first_step=0.0)
assert info_LA['internal_yout'].shape[1] == 3

({k: v for k, v in info_LN.items() if not k.startswith('internal')},
 {k: v for k, v in info_LA.items() if not k.startswith('internal')})

We can also have the solver return to use when some precondition is fulfilled, e.g. when the concentraion of 'N' and 'A' are equal:

In [ ]:
from pyodesys.symbolic import SymbolicSys
psys_root = SymbolicSys.from_other(psysLN, roots=[psysLN['N'] - psysLN['A']])
In [ ]:
psys_root.roots
In [ ]:
psysLN['N']
In [ ]:
psysLN.analytic_exprs
In [ ]:
psysLN.names
In [ ]:
psysLN.dep
In [ ]:
tout1, Cout1, info_root = integrate_and_plot(psys_root, first_step=0.0, return_on_root=True)
print('Time at which concnetrations of N & A are equal: %.4g' % (tout1[-1]))

From this point in time onwards we could for example choose to continue our integration using another formulation of the ODE-system:

In [ ]:
xout2, yout2, info_LA = integrate_and_plot(psysLA, first_step=0.0, t0=tout1[-1], c0=dict(zip(odesys.names, Cout1[-1, :])))

Let's compare the total number steps needed for our different approaches:

In [ ]:
print('\troot\tLA\troot+LA\tLN')
for k in 'n_steps nfev njev'.split():
    print('\t'.join(map(str, (k, info_root[k], info_LA[k], info_root[k] + info_LA[k], info_LN[k]))))

In this case it did not earn us much, one reason is that we actually don't need to find the root with as high accuracy as we do. But having the option is still useful.

Using pyodesys and SymPy we can perform a variable transformation and solve the transformed system if we so wish:

In [ ]:
from pyodesys.symbolic import symmetricsys
logexp = lambda x: sympy.log(x + 1e-20), lambda x: sympy.exp(x) - 1e-20
def psimp(exprs):
    return [sympy.powsimp(expr.expand(), force=True) for expr in exprs]
LogLogSys = symmetricsys(logexp, logexp, exprs_process_cb=psimp)
unscaled_odesys, unscaled_extra = get_odesys(rsys, include_params=False)
tsys = LogLogSys.from_other(unscaled_odesys)
unscaledLN = PartiallySolvedSystem(unscaled_odesys, unscaled_extra['linear_dependencies'](['L', 'N']), **no_invar)
unscaledLA = PartiallySolvedSystem(unscaled_odesys, unscaled_extra['linear_dependencies'](['L', 'A']), **no_invar)
assert sorted(unscaledLN.free_names) == sorted(['U', 'A', 'NL'])
assert sorted(unscaledLA.free_names) == sorted(['U', 'N', 'NL'])
tsysLN = LogLogSys.from_other(unscaledLN)
tsysLA = LogLogSys.from_other(unscaledLA)
In [ ]:
_, _, info_t = integrate_and_plot(tsys, first_step=0.0)
{k: info_t[k] for k in ('nfev', 'njev', 'n_steps')}

We can even apply the transformation our reduced systems (doing so by hand is excessively painful and error prone):

In [ ]:
native_tLN = NativeCvodeSys.from_other(tsysLN)
_, _, info_tLN = integrate_and_plot(native_tLN, first_step=1e-9, nsteps=18000, atol=1e-9, rtol=1e-9)
{k: info_tLN[k] for k in ('nfev', 'njev', 'n_steps')}
In [ ]:
_, _, info_tLN = integrate_and_plot(tsysLN, first_step=1e-9, nsteps=18000, atol=1e-8, rtol=1e-8)
{k: info_tLN[k] for k in ('nfev', 'njev', 'n_steps')}
In [ ]:
_, _, info_tLA = integrate_and_plot(tsysLA, first_step=0.0)
{k: info_tLA[k] for k in ('nfev', 'njev', 'n_steps')}

Finally, let's take a look at the C++ code which was generated for us:

In [ ]:
print(open(next(filter(lambda s: s.endswith('.cpp'), native2._native._written_files))).read())