Notebook

Using checkpoint_schedules¶

This notebook aims to present an illustrative example explaining the usage of checkpoint_schedules for step-based incremental checkpointing of the adjoints to computer models. While it is an illustrative example, this code can be also employed for real applications.

Managing the forward and adjoint executions with schedules¶

We first write the CheckpointingManager class intending to manage the execution of forward and adjoint models using a checkpointing schedule. The CheckpointingManager constructor takes the maximum number of steps in the model's execution, max_n. index_action is a counter of the actions executed, and list_actions is a list of the actions executed. The attributes index_action and list_actions are here used only for illustration.

The CheckpointingManager has the method execute which runs the schedule. execute takes the cp_schedule argument, which expects to be a generator provided by the checkpoint_schedules package. In the execute method, we iterate over the elements of the cp_schedule using enumerate(cp_schedule), which yields a tuple (count, cp_action). Here, count represents the index of the action within the schedule, and cp_action is a checkpoint action provided by checkpoint_schedules.

cp_action is the argument to a single-dispatch generic function named action. The purpose of this function is to process different types of checkpoint actions using a specific function. The overloading of the action function is given by its register() attribute employed as a decorator in the specific functions, e.g., we have the @action.register(Forward) decorator for the action_forward function. Thus, action is overloaded by action_forward if cp_action is the Forward action. Inside action_forward, we can implement the necessary code to step the forward model.

Notes:

The action.register decorator takes checkpoint_schedules actions as the arguments. These actions will be presented in more detail in the following sections of this tutorial.

In [46]:

from checkpoint_schedules import *
import functools


class CheckpointingManager():
    """Manage the executions of the forward and adjoint solvers.

    Attributes
    ----------
    max_n : int
        Total number of steps.
    list_actions : list
        Store the actions. Only used for illustration.
    index_action : int
        Index of the action. Only used for illustration.
    """
    def __init__(self, max_n):
        self.max_n = max_n
        self.list_actions = []
        self.index_action = 0
        
    def execute(self, cp_schedule):
        """Execute forward and adjoint using a checkpointing schedule.

        Parameters
        ----------
        cp_schedule : CheckpointSchedule
            Checkpoint schedule object.

        Notes
        -----
        `cp_schedule` provides the schedule of the actions to be taken and also a
        generator that yields the *checkpoint_schedules* actions.
        """
        @functools.singledispatch
        def action(cp_action):
            raise TypeError("Unexpected action")

        @action.register(Forward)
        def action_forward(cp_action):
            nonlocal step_n
            def illustrate_runtime(a, b, singlestorage):
                # function used to illustrate the runtime of the forward execution   
                if singlestorage:
                    time_exec = ".   "*cp_action.n0 + (a + '--' + b)*(n1-cp_action.n0)
                else:
                    time_exec = ".   "*cp_action.n0 + (a + ('---' + b)*(n1-cp_action.n0))
                return time_exec
            
            n1 = min(cp_action.n1, self.max_n)

            # the symbols used in the illustrations            
            if cp_action.write_ics and cp_action.write_adj_deps:
                singlestorage = True
                a = '\u002b' 
                b = '\u25b6'
            else:
                singlestorage = False
                if cp_action.write_ics and cp_action.storage == StorageType.DISK:
                    a = '+'
                elif cp_action.write_ics and cp_action.storage == StorageType.RAM:
                    a = '*'
                else:
                    a = ''
                if cp_action.write_adj_deps:
                    b = "\u25b6"
                else:
                    b = "\u25b7"
            # Illustration of the forward execution in time
            time_exec = illustrate_runtime(a, b, singlestorage)

            self.list_actions.append([self.index_action, time_exec, str(cp_action)])

            step_n = n1
            if n1 == self.max_n:
                cp_schedule.finalize(n1)

        @action.register(Reverse)
        def action_reverse(cp_action):
            nonlocal step_r
            # Illustration of the adjoint execution in time 
            steps  = (cp_action.n1-cp_action.n0)
            step_r += cp_action.n1 - cp_action.n0
            time_exec = ".   "*(self.max_n - step_r) + (('\u25c0' + '---')*steps)
                                
            self.list_actions.append([self.index_action, time_exec, str(cp_action)])
            
        @action.register(Copy)
        def action_copy(cp_action):
            self.list_actions.append([self.index_action, " ", str(cp_action)])

        @action.register(Move)
        def action_move(cp_action):
            self.list_actions.append([self.index_action, " ", str(cp_action)])

        @action.register(EndForward)
        def action_end_forward(cp_action):
            assert step_n == self.max_n
            act = "End Forward"
            self.list_actions.append([self.index_action, act, str(cp_action)])
            if cp_schedule._max_n is None:
                cp_schedule._max_n = self.max_n
            
        @action.register(EndReverse)
        def action_end_reverse(cp_action):
            nonlocal step_r, is_exhausted
            # verifying whether the adjoint execution reached the end
            assert step_r == self.max_n
            # Informing the schedule that the execution is exhausted
            is_exhausted = cp_schedule.is_exhausted
            act = "End Reverse"
            self.list_actions.append([self.index_action, act, str(cp_action)])
            
        step_n = 0 # forward step
        step_r = 0 # adjoint step
        is_exhausted = False # flag to indicate whether the schedule is exhausted
        for count, cp_action in enumerate(cp_schedule):
            self.index_action = count
            action(cp_action)
            if isinstance (cp_action, EndReverse):
                break
        
        # Printing the illustration of the execution
        from tabulate import tabulate
        print(tabulate(self.list_actions, headers=['Action index:', 'Run-time illustration', 
                                                    'Action:']))
        self.list_actions = []

A trivial schedule for forward computation¶

We start with a trivial checkpoint schedule used to execute only the forward solver, excluding any data storage. Hence, let us define the maximum solver time steps as max_n = 4 and the object solver_manager used to manage the solver (in this case, a forward solver).

In [47]:

max_n = 4 # Total number of time steps.
solver_manager = CheckpointingManager(max_n) # manager object

In this current case, NoneCheckpointSchedule class provides the checkpoint schedule , running the forward model.

In [48]:

cp_schedule = NoneCheckpointSchedule() # Checkpoint schedule object
solver_manager.execute(cp_schedule) # Execute the forward solver by following the schedule.

  Action index:  Run-time illustration    Action:
---------------  -----------------------  -------------------------------------------------------
              0  ---▷---▷---▷---▷         Forward(0, sys.maxsize, False, False, StorageType.NONE)
              1  End Forward              EndForward()

When executing solver_manager.execute(cp_schedule), the output provides a visual representation of the three distinct informations:

An index linked to each action,
A visualisation showing the steps advancing,
The actions associated with each step.

Notice in the output that we have two actions: Forward and EndForward(). The latter indicates the forward solver has reached the end of the step interval. Whereas the Forward action is fundamentally given by:

Forward(n0, n1, write_ics, write_adj_deps, storage_type)

This action is read as:

Advance the forward from the start of step n0 to the start of a step n1. In this case, n1 = sys. max size is used because it is not a prerequisite to specifyn1for theNoneCheckpointSchedule` schedule, which leads to the flexibility to determine the desired steps during the forward execution.
write_ics and write_adj_deps are booleans that indicate whether the forward solver should store the forward restarting data and the forward data required for the adjoint computation, respectively.
storage_type indicates the type of storage type, which can be StorageType.NONE, StorageType.RAM, StorageType.DISK or StorageType.WORK.

As mentioned above, NoneCheckpointSchedule schedule is flexible to specify the desired steps during the forward execution. In this case, we can specify the steps by using the finalize method as shown below:

cp_schedule.finalize(n1)

where n1 = max_n = 4. This code line is incorporated in the action_forward.

Trivial schedule for forward and adjoint computation¶

The following code is practical for the cases where the user intend to store the forward data for all steps. This schedule is given by SingleMemoryStorageSchedule.

The SingleMemoryStorageSchedule schedule does not require the n1 step. Analagous to NoneCheckpointSchedule, SingleMemoryStorageSchedule can create its schedule without prerequisite of specifying the n1.

In [49]:

cp_schedule = SingleMemoryStorageSchedule()
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ------------------------------------------------------
              0  ---▶---▶---▶---▶         Forward(0, sys.maxsize, False, True, StorageType.WORK)
              1  End Forward              EndForward()
              2  ◀---◀---◀---◀---         Reverse(4, 0, True)
              3  End Reverse              EndReverse()

In this particular case, the Forward action is given by:

Advance the forward solver from the step n0 = 0 to the start of any step n1.
Do not store the forward restart data once write_ics is 'False'.
Store the forward data required for the adjoint computation once write_adj_deps is 'True'.
Storage type is <StorageType.WORK>, which is the working memory location for the adjoint.

For the adjoint computation, we have the Reverse action that has the base form:

Reverse(n0, n1, clear_adj_deps)

This is interpreted as follows:

Advance the adjoint model from the step n0 to the start of the step n1.
Clear the adjoint dependency data if clear_adj_deps is 'True'.

Thus, in the current example, the Reverse action reads:

Advance the adjoint from the start of step 4 to the start of the step 0 (i.e. over step 0).
Clear the forward data used by the adjoint (clear_adj_deps is 'True').

Lastly, the EndReverse() is an action used to inform the finalisation of the adjoint model executions.

checkpoint_schedules allows the forward data storage on 'disk'. The storage of all forward data used for adjoint computation on 'disk' is reached with SingleDiskStorageSchedule.

In [50]:

cp_schedule = SingleDiskStorageSchedule()
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ------------------------------------------------------
              0  ---▶---▶---▶---▶         Forward(0, sys.maxsize, False, True, StorageType.DISK)
              1  End Forward              EndForward()
              2                           Copy(3, StorageType.DISK, StorageType.WORK)
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(2, StorageType.DISK, StorageType.WORK)
              5  .   .   ◀---             Reverse(3, 2, True)
              6                           Copy(1, StorageType.DISK, StorageType.WORK)
              7  .   ◀---                 Reverse(2, 1, True)
              8                           Copy(0, StorageType.DISK, StorageType.WORK)
              9  ◀---                     Reverse(1, 0, True)
             10  End Reverse              EndReverse()

In this case, forward and adjoint executions with SingleDiskStorageSchedule have the Copy action (see the outputs associated with the indexes 2, 4, 6, 8) which indicates copying of the forward data from one storage type to another.

The Copy action has the general form:

Copy(n, from_storage, to_storage)

which reads:

Copy the data associated with step n.
The term from_storage denotes the storage type responsible for retaining forward data at step n, while to_storage refers to the designated storage type for storing this forward data.

Hence, on considering the Copy action associated with the output Action index 4, we have:

Copy the data associated with step n = 2, which is stored in StorageType.DISK, to working storage for use

by the adjoint model.

Instead of copying the data, we can move the data from one storage type to another. To do so, checkpoint_schedules has a Move action used to indicate that the data, once moved, is no longer accessible in the original storage type. In SingleDiskStorageSchedule, we can move the forward data by setting the optional move_data parameter as True.

In [51]:

cp_schedule = SingleDiskStorageSchedule(move_data=True)
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ------------------------------------------------------
              0  ---▶---▶---▶---▶         Forward(0, sys.maxsize, False, True, StorageType.DISK)
              1  End Forward              EndForward()
              2                           Move(3, StorageType.DISK, StorageType.WORK)
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Move(2, StorageType.DISK, StorageType.WORK)
              5  .   .   ◀---             Reverse(3, 2, True)
              6                           Move(1, StorageType.DISK, StorageType.WORK)
              7  .   ◀---                 Reverse(2, 1, True)
              8                           Move(0, StorageType.DISK, StorageType.WORK)
              9  ◀---                     Reverse(1, 0, True)
             10  End Reverse              EndReverse()

The Move action follows a basic form:

Move(n, from_storage, to_storage)

which can be read as:

Move the data associated with step n.
The terms from_storage and to_storage are the storage types from and to which the data should be moved, respectively.

Thus, the Move action associated with the output Action index: 4 reads:

Move the data associated with step n = 2, which is stored in StorageType.DISK, to working storage for use

by the adjoint model.

Schedules given by checkointing algorithms¶

Here, we start to present schedules obtained by checkpointing algorithms

Revolve¶

The Revolve strategy, as introduced in reference [1], generates a schedule that only uses 'RAM' storage type.

The Revolve class gives a schedule according to two parameters: the total forward steps (max_n = 4) and the number of checkpoints to store in 'RAM' (snaps_in_ram = 2).

In [52]:

from checkpoint_schedules import StorageType
snaps_in_ram = 2 
solver_manager = CheckpointingManager(max_n) # manager object
cp_schedule = Revolve(max_n, snaps_in_ram) 
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷                Forward(0, 2, True, False, StorageType.RAM)
              1  .   .   *---▷            Forward(2, 3, True, False, StorageType.RAM)
              2  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              3  End Forward              EndForward()
              4  .   .   .   ◀---         Reverse(4, 3, True)
              5                           Move(2, StorageType.RAM, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Copy(0, StorageType.RAM, StorageType.WORK)
              9  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             14  ◀---                     Reverse(1, 0, True)
             15  End Reverse              EndReverse()

The employment of the checkpointing strategies in the an adjoint-based gradient requires the forward solver recomputation. As demonstrated in the output above, we have the Forward action associated with the Action index: 0 that reads as follows:

- Advance from time step 0 to the start of the time step 2.

- Store the forward data required to restart the forward solver from time step 0.

- The storage of the forward restart data is done in RAM.

In the displayed time step illustrations, we have '*−−−▷−−−▷' associated to

Forward(0, 2, True, False, <StorageType.RAM: 0>)

The symbolic illustration of the step advancing reads:

'*': Forward data for restarting the forward solver is stored in 'RAM'.
'−−−▷': Forward data used for adjoint computation is not stored.
'−−−▶': Forward data used for adjoint computation is stored.

Multistage checkpointing¶

The schedule as depicted below, employes a MultiStage distribution of checkpoints between 'RAM' and 'disk' as described in [2]. This checkpointing approach allows only memory storage ('RAM') or 'disk' storage, or a combination.

The following code use two types of storage, 'RAM' and 'disk'.

In [53]:

snaps_in_ram = 1  # maximum number of checkpoints stored in RAM
snaps_on_disk = 1 # maximum number of checkpoints stored in disk
cp_schedule = MultistageCheckpointSchedule(max_n, snaps_in_ram, snaps_on_disk)
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷                Forward(0, 2, True, False, StorageType.RAM)
              1  .   .   +---▷            Forward(2, 3, True, False, StorageType.DISK)
              2  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              3  End Forward              EndForward()
              4  .   .   .   ◀---         Reverse(4, 3, True)
              5                           Move(2, StorageType.DISK, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Copy(0, StorageType.RAM, StorageType.WORK)
              9  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             14  ◀---                     Reverse(1, 0, True)
             15  End Reverse              EndReverse()

The symbol '+' indicates that the forward data necessary for restarting the forward computation from step 0 is stored in 'disk'.

Disk-Revolve¶

The following code shows the execution of a forward step advancing using the Disk-Revolve schedule [3]. This schedule considers two type of storage: memory ('RAM') and 'disk'.

The Disk-Revolve algorithm, available within the checkpoint_schedules, requires the definition of checkpoints stored in memory to be greater than 0 ('snap_in_ram > 0'). Specifying the checkpoints stored on 'disk' is not required, as the algorithm itself calculates this value.

The number of checkpoints stored in 'disk' is determined according to the costs associated with advancing the backward and forward solvers in a single step and the costs of writing and reading the checkpoints saved on disk. Additional details of the Disk-Revolve algorithmic are avaible in the references [3], [4] and [5].

In [54]:

snaps_in_ram = 1 # number of checkpoints stored in RAM
cp_schedule = DiskRevolve(max_n, snapshots_in_ram=snaps_in_ram) # checkpointing schedule object
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷---▷            Forward(0, 3, True, False, StorageType.RAM)
              1  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              2  End Forward              EndForward()
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(0, StorageType.RAM, StorageType.WORK)
              5  ---▷---▷                 Forward(0, 2, False, False, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Copy(0, StorageType.RAM, StorageType.WORK)
              9  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             14  ◀---                     Reverse(1, 0, True)
             15  End Reverse              EndReverse()

Periodic Disk Revolve¶

Periodic Disk Revolve is a two type hierarchical schedule [4]. This strategy requires the specification of the maximum number of steps (max_n) and the number of checkpoints stored in memory (snaps_in_ram) and computes automatically the number of checkpoint stored in disk.

Periodic Disk Revolve schedule is generated with PeriodicDiskRevolve class. This schedule is contrained to 'snap_in_ram > 0'.

In [55]:

snaps_in_ram = 1
cp_schedule = PeriodicDiskRevolve(max_n, snaps_in_ram)
solver_manager.execute(cp_schedule)

We use periods of size  3
  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷---▷            Forward(0, 3, True, False, StorageType.RAM)
              1  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              2  End Forward              EndForward()
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(0, StorageType.RAM, StorageType.WORK)
              5  ---▷---▷                 Forward(0, 2, False, False, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Copy(0, StorageType.RAM, StorageType.WORK)
              9  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             14  ◀---                     Reverse(1, 0, True)
             15  End Reverse              EndReverse()

H-Revolve¶

The following code illustrates the forward and adjoint computations using the checkpointing given by H-Revolve strategy [5]. This checkpointing schedule is generated with HRevolve class, which requires the following parameters: maximum steps stored in RAM (snap_in_ram), maximum steps stored on disk (snap_on_disk), and the number of time steps (max_n).

HRevolve is constrained for the number of checkpoints in 'RAM' to be greater than zero ('snap_in_ram > 0')

In [56]:

snaps_on_disk = 1
snaps_in_ram = 1
cp_schedule = HRevolve(max_n, snaps_in_ram, snaps_on_disk)  # checkpointing schedule
solver_manager.execute(cp_schedule) # execute forward and adjoint in time with the schedule

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷---▷            Forward(0, 3, True, False, StorageType.RAM)
              1  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              2  End Forward              EndForward()
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(0, StorageType.RAM, StorageType.WORK)
              5  ---▷---▷                 Forward(0, 2, False, False, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Copy(0, StorageType.RAM, StorageType.WORK)
              9  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             14  ◀---                     Reverse(1, 0, True)
             15  End Reverse              EndReverse()

Mixed checkpointing¶

The Mixed checkpointing strategy works under the assumption that the data required to restart the forward computation is of the same size as the data required to advance the adjoint model in one step. Further details into the Mixed checkpointing schedule was discussed in reference [6].

This specific schedule provides the flexibility to store the forward restart data either in 'RAM' or on 'disk', but not both simultaneously within the same schedule.

In [57]:

snaps_on_disk = 1
max_n = 4
cp_schedule = MixedCheckpointSchedule(max_n, snaps_on_disk)
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  +---▷---▷---▷            Forward(0, 3, True, False, StorageType.DISK)
              1  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              2  End Forward              EndForward()
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(0, StorageType.DISK, StorageType.WORK)
              5  ---▷---▷                 Forward(0, 2, False, False, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Move(0, StorageType.DISK, StorageType.WORK)
              9  ---▶                     Forward(0, 1, False, True, StorageType.DISK)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.DISK, StorageType.WORK)
             13  ◀---                     Reverse(1, 0, True)
             14  End Reverse              EndReverse()

In the example mentioned earlier, the storage of the forward restart data is default configured for 'disk'. To modify the storage type to 'RAM', the user can set the MixedCheckpointSchedule argument storage = StorageType.RAM, as displayed below.

In [58]:

snaps_in_ram = 1
cp_schedule = MixedCheckpointSchedule(max_n, snaps_on_disk, storage=StorageType.RAM)
solver_manager.execute(cp_schedule)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  *---▷---▷---▷            Forward(0, 3, True, False, StorageType.RAM)
              1  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              2  End Forward              EndForward()
              3  .   .   .   ◀---         Reverse(4, 3, True)
              4                           Copy(0, StorageType.RAM, StorageType.WORK)
              5  ---▷---▷                 Forward(0, 2, False, False, StorageType.WORK)
              6  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
              7  .   .   ◀---             Reverse(3, 2, True)
              8                           Move(0, StorageType.RAM, StorageType.WORK)
              9  ---▶                     Forward(0, 1, False, True, StorageType.RAM)
             10  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             11  .   ◀---                 Reverse(2, 1, True)
             12                           Move(0, StorageType.RAM, StorageType.WORK)
             13  ◀---                     Reverse(1, 0, True)
             14  End Reverse              EndReverse()

Two-level binomial¶

Two-level binomial schedule was presented in reference [6], and its application was performed in the work [7].

The two-level binomial checkpointing stores the forward restart data based on the user-defined period. In this schedule, we can define the limit for additional storage of the forward restart data during the step advancing of the adjoint model. The default storage type is 'disk'.

The two-level binomial schedule is provided by TwoLevelCheckpointSchedule. To obtain this schedule we need the period period = 2 and the extra forward restart data storage add_snaps = 1.

In [59]:

add_snaps = 1 # additional storage of the forward restart data
period = 3
revolver = TwoLevelCheckpointSchedule(period, add_snaps)
solver_manager.execute(revolver)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  +---▷---▷---▷            Forward(0, 3, True, False, StorageType.DISK)
              1  .   .   .   +---▷        Forward(3, 6, True, False, StorageType.DISK)
              2  End Forward              EndForward()
              3                           Copy(3, StorageType.DISK, StorageType.WORK)
              4  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              5  .   .   .   ◀---         Reverse(4, 3, True)
              6                           Copy(0, StorageType.DISK, StorageType.WORK)
              7  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
              8  .   +---▷                Forward(1, 2, True, False, StorageType.DISK)
              9  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
             10  .   .   ◀---             Reverse(3, 2, True)
             11                           Move(1, StorageType.DISK, StorageType.WORK)
             12  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             13  .   ◀---                 Reverse(2, 1, True)
             14                           Copy(0, StorageType.DISK, StorageType.WORK)
             15  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             16  ◀---                     Reverse(1, 0, True)
             17  End Reverse              EndReverse()

The output above shows the forward and adjoint executions using the two-level binomial checkpointing. Notice that the action associated with Action index: 8 shows that the additional forward restart data storage is stored on 'disk'.

We can also store the additional forward restart checkpointing in 'RAM' by setting the optional argument binomial_storage = StorageType.RAM. The output below displays the action associated with Action index: 8 showing that the forward restart data storage is in 'RAM'.

In [60]:

revolver = TwoLevelCheckpointSchedule(period, binomial_snapshots=snaps_on_disk, 
                                      binomial_storage=StorageType.RAM)
solver_manager.execute(revolver)

  Action index:  Run-time illustration    Action:
---------------  -----------------------  ---------------------------------------------
              0  +---▷---▷---▷            Forward(0, 3, True, False, StorageType.DISK)
              1  .   .   .   +---▷        Forward(3, 6, True, False, StorageType.DISK)
              2  End Forward              EndForward()
              3                           Copy(3, StorageType.DISK, StorageType.WORK)
              4  .   .   .   ---▶         Forward(3, 4, False, True, StorageType.WORK)
              5  .   .   .   ◀---         Reverse(4, 3, True)
              6                           Copy(0, StorageType.DISK, StorageType.WORK)
              7  ---▷                     Forward(0, 1, False, False, StorageType.WORK)
              8  .   *---▷                Forward(1, 2, True, False, StorageType.RAM)
              9  .   .   ---▶             Forward(2, 3, False, True, StorageType.WORK)
             10  .   .   ◀---             Reverse(3, 2, True)
             11                           Move(1, StorageType.RAM, StorageType.WORK)
             12  .   ---▶                 Forward(1, 2, False, True, StorageType.WORK)
             13  .   ◀---                 Reverse(2, 1, True)
             14                           Copy(0, StorageType.DISK, StorageType.WORK)
             15  ---▶                     Forward(0, 1, False, True, StorageType.WORK)
             16  ◀---                     Reverse(1, 0, True)
             17  End Reverse              EndReverse()

Final remarks¶

This notebook focused on a visual illustration of the employment of the schedules available in the checkpointing_schedule package. The specific function (e.g. action_forward) proposed to illustrate the step execution of the forward and adjoint models. However, the user can implement the necessary code to the step execution of the forward and adjoint models and the copy and move codes instead of the illustrative code.

References¶

[1] Griewank, A., & Walther, A. (2000). Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Transactions on Mathematical Software (TOMS), 26(1), 19-45., doi: https://doi.org/10.1145/347837.347846

[2] Stumm, P., & Walther, A. (2009). Multistage approaches for optimal offline checkpointing. SIAM Journal on Scientific Computing, 31(3), 1946-1967. https://doi.org/10.1137/080718036

[3] Aupy, G., Herrmann, J., Hovland, P., & Robert, Y. (2016). Optimal multistage algorithm for adjoint computation. SIAM Journal on Scientific Computing, 38(3), C232-C255. DOI: https://doi.org/10.1145/347837.347846.

[4] Aupy, G., & Herrmann, J. (2017). Periodicity in optimal hierarchical checkpointing schemes for adjoint computations. Optimization Methods and Software, 32(3), 594-624. doi: https://doi.org/10.1080/10556788.2016.1230612

[5] Herrmann, J. and Pallez (Aupy), G. (2020). H-Revolve: a framework for adjoint computation on synchronous hierarchical platforms. ACM Transactions on Mathematical Software (TOMS), 46(2), 1-25. DOI: https://doi.org/10.1145/3378672.

[6] Maddison, J. R. (2023). On the implementation of checkpointing with high-level algorithmic differentiation. arXiv preprint arXiv:2305.09568. https://doi.org/10.48550/arXiv.2305.09568.

[7] Pringle, G. C., Jones, D. C., Goswami, S., Narayanan, S. H. K., and Goldberg, D. (2016). Providing the ARCHER community with adjoint modelling tools for high-performance oceanographic and cryospheric computation. https://nora.nerc.ac.uk/id/eprint/516314.

[8] Goldberg, D. N., Smith, T. A., Narayanan, S. H., Heimbach, P., and Morlighem, M. (2020). Bathymetric Influences on Antarctic Ice‐Shelf Melt Rates. Journal of Geophysical Research: Oceans, 125(11), e2020JC016370. doi: https://doi.org/10.1029/2020JC016370.