This notebook aims to present an illustrative example explaining the usage of checkpoint_schedules for step-based incremental checkpointing of the adjoints to computer models. While it is an illustrative example, this code can be also employed for real applications.
We first write the CheckpointingManager
class intending to manage the execution of forward and adjoint models using a checkpointing schedule. The CheckpointingManager
constructor takes the maximum number of steps in the model's execution, max_n
. index_action
is a counter of the actions executed, and list_actions
is a list of the actions executed. The attributes index_action
and list_actions
are here used only for illustration.
The CheckpointingManager
has the method execute
which runs the schedule. execute
takes the cp_schedule
argument, which expects to be a generator provided by the checkpoint_schedules package. In the execute
method, we iterate over the elements of the cp_schedule
using enumerate(cp_schedule)
, which yields a tuple (count, cp_action)
. Here, count
represents the index of the action within the schedule, and cp_action
is a checkpoint action provided by checkpoint_schedules.
cp_action
is the argument to a single-dispatch generic function named action
. The purpose of this function is to process different types of checkpoint actions using a specific function. The overloading of the action
function is given by its register()
attribute employed as a decorator in the specific functions, e.g., we have the @action.register(Forward)
decorator for the action_forward
function. Thus, action
is overloaded by action_forward
if cp_action
is the Forward
action. Inside action_forward
, we can implement the necessary code to step the forward model.
Notes:
action.register
decorator takes checkpoint_schedules actions as the arguments. These actions will be presented in more detail in the following sections of this tutorial.from checkpoint_schedules import *
import functools
class CheckpointingManager():
"""Manage the executions of the forward and adjoint solvers.
Attributes
----------
max_n : int
Total number of steps.
list_actions : list
Store the actions. Only used for illustration.
index_action : int
Index of the action. Only used for illustration.
"""
def __init__(self, max_n):
self.max_n = max_n
self.list_actions = []
self.index_action = 0
def execute(self, cp_schedule):
"""Execute forward and adjoint using a checkpointing schedule.
Parameters
----------
cp_schedule : CheckpointSchedule
Checkpoint schedule object.
Notes
-----
`cp_schedule` provides the schedule of the actions to be taken and also a
generator that yields the *checkpoint_schedules* actions.
"""
@functools.singledispatch
def action(cp_action):
raise TypeError("Unexpected action")
@action.register(Forward)
def action_forward(cp_action):
nonlocal step_n
def illustrate_runtime(a, b, singlestorage):
# function used to illustrate the runtime of the forward execution
if singlestorage:
time_exec = ". "*cp_action.n0 + (a + '--' + b)*(n1-cp_action.n0)
else:
time_exec = ". "*cp_action.n0 + (a + ('---' + b)*(n1-cp_action.n0))
return time_exec
n1 = min(cp_action.n1, self.max_n)
# the symbols used in the illustrations
if cp_action.write_ics and cp_action.write_adj_deps:
singlestorage = True
a = '\u002b'
b = '\u25b6'
else:
singlestorage = False
if cp_action.write_ics and cp_action.storage == StorageType.DISK:
a = '+'
elif cp_action.write_ics and cp_action.storage == StorageType.RAM:
a = '*'
else:
a = ''
if cp_action.write_adj_deps:
b = "\u25b6"
else:
b = "\u25b7"
# Illustration of the forward execution in time
time_exec = illustrate_runtime(a, b, singlestorage)
self.list_actions.append([self.index_action, time_exec, str(cp_action)])
step_n = n1
if n1 == self.max_n:
cp_schedule.finalize(n1)
@action.register(Reverse)
def action_reverse(cp_action):
nonlocal step_r
# Illustration of the adjoint execution in time
steps = (cp_action.n1-cp_action.n0)
step_r += cp_action.n1 - cp_action.n0
time_exec = ". "*(self.max_n - step_r) + (('\u25c0' + '---')*steps)
self.list_actions.append([self.index_action, time_exec, str(cp_action)])
@action.register(Copy)
def action_copy(cp_action):
self.list_actions.append([self.index_action, " ", str(cp_action)])
@action.register(Move)
def action_move(cp_action):
self.list_actions.append([self.index_action, " ", str(cp_action)])
@action.register(EndForward)
def action_end_forward(cp_action):
assert step_n == self.max_n
act = "End Forward"
self.list_actions.append([self.index_action, act, str(cp_action)])
if cp_schedule._max_n is None:
cp_schedule._max_n = self.max_n
@action.register(EndReverse)
def action_end_reverse(cp_action):
nonlocal step_r, is_exhausted
# verifying whether the adjoint execution reached the end
assert step_r == self.max_n
# Informing the schedule that the execution is exhausted
is_exhausted = cp_schedule.is_exhausted
act = "End Reverse"
self.list_actions.append([self.index_action, act, str(cp_action)])
step_n = 0 # forward step
step_r = 0 # adjoint step
is_exhausted = False # flag to indicate whether the schedule is exhausted
for count, cp_action in enumerate(cp_schedule):
self.index_action = count
action(cp_action)
if isinstance (cp_action, EndReverse):
break
# Printing the illustration of the execution
from tabulate import tabulate
print(tabulate(self.list_actions, headers=['Action index:', 'Run-time illustration',
'Action:']))
self.list_actions = []
We start with a trivial checkpoint schedule used to execute only the forward solver, excluding any data storage. Hence, let us define the maximum solver time steps as max_n = 4
and the object solver_manager
used to manage the solver (in this case, a forward solver).
max_n = 4 # Total number of time steps.
solver_manager = CheckpointingManager(max_n) # manager object
In this current case, NoneCheckpointSchedule
class provides the checkpoint schedule , running the forward model.
cp_schedule = NoneCheckpointSchedule() # Checkpoint schedule object
solver_manager.execute(cp_schedule) # Execute the forward solver by following the schedule.
Action index: Run-time illustration Action: --------------- ----------------------- ------------------------------------------------------- 0 ---▷---▷---▷---▷ Forward(0, sys.maxsize, False, False, StorageType.NONE) 1 End Forward EndForward()
When executing solver_manager.execute(cp_schedule)
, the output provides a visual representation of the three distinct informations:
An index linked to each action,
A visualisation showing the steps advancing,
The actions associated with each step.
Notice in the output that we have two actions: Forward and EndForward(). The latter indicates the forward solver has reached the end of the step interval. Whereas the Forward action is fundamentally given by:
Forward(n0, n1, write_ics, write_adj_deps, storage_type)
This action is read as:
Advance the forward from the start of step n0
to the start of a step n1
.
In this case, n1 = sys. max size is used because it is not a prerequisite to specify
n1for the
NoneCheckpointSchedule` schedule, which leads to the flexibility to determine the desired steps during the forward execution.
write_ics
and write_adj_deps
are booleans that indicate whether the forward solver should store the forward restarting data and the forward data required for the adjoint computation, respectively.
storage_type
indicates the type of storage type, which can be StorageType.NONE
, StorageType.RAM
, StorageType.DISK
or StorageType.WORK
.
As mentioned above, NoneCheckpointSchedule
schedule is flexible to specify the desired steps during the forward execution. In this case, we can specify the steps by using the finalize
method as shown below:
cp_schedule.finalize(n1)
where n1 = max_n = 4
. This code line is incorporated in the action_forward
.
The following code is practical for the cases where the user intend to store the forward data for all steps. This schedule is given by SingleMemoryStorageSchedule
.
The SingleMemoryStorageSchedule
schedule does not require the n1
step. Analagous to NoneCheckpointSchedule
, SingleMemoryStorageSchedule
can create its schedule without prerequisite of specifying the n1
.
cp_schedule = SingleMemoryStorageSchedule()
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- ------------------------------------------------------ 0 ---▶---▶---▶---▶ Forward(0, sys.maxsize, False, True, StorageType.WORK) 1 End Forward EndForward() 2 ◀---◀---◀---◀--- Reverse(4, 0, True) 3 End Reverse EndReverse()
In this particular case, the Forward action is given by:
Advance the forward solver from the step n0 = 0
to the start of any step n1
.
Do not store the forward restart data once write_ics
is 'False'
.
Store the forward data required for the adjoint computation once write_adj_deps
is 'True'
.
Storage type is <StorageType.WORK>
, which is the working memory location for the adjoint.
For the adjoint computation, we have the Reverse action that has the base form:
Reverse(n0, n1, clear_adj_deps)
This is interpreted as follows:
Advance the adjoint model from the step n0
to the start of the step n1
.
Clear the adjoint dependency data if clear_adj_deps
is 'True'
.
Thus, in the current example, the Reverse action reads:
Advance the adjoint from the start of step 4 to the start of the step 0 (i.e. over step 0).
Clear the forward data used by the adjoint (clear_adj_deps
is 'True'
).
Lastly, the EndReverse()
is an action used to inform the finalisation of the adjoint model executions.
checkpoint_schedules allows the forward data storage on 'disk'
. The storage of all forward data used for adjoint computation on 'disk'
is reached with SingleDiskStorageSchedule
.
cp_schedule = SingleDiskStorageSchedule()
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- ------------------------------------------------------ 0 ---▶---▶---▶---▶ Forward(0, sys.maxsize, False, True, StorageType.DISK) 1 End Forward EndForward() 2 Copy(3, StorageType.DISK, StorageType.WORK) 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(2, StorageType.DISK, StorageType.WORK) 5 . . ◀--- Reverse(3, 2, True) 6 Copy(1, StorageType.DISK, StorageType.WORK) 7 . ◀--- Reverse(2, 1, True) 8 Copy(0, StorageType.DISK, StorageType.WORK) 9 ◀--- Reverse(1, 0, True) 10 End Reverse EndReverse()
In this case, forward and adjoint executions with SingleDiskStorageSchedule
have the Copy action (see the outputs associated with the indexes 2, 4, 6, 8) which indicates copying of the forward data from one storage type to another.
The Copy action has the general form:
Copy(n, from_storage, to_storage)
which reads:
Copy the data associated with step n
.
The term from_storage
denotes the storage type responsible for retaining forward data at step n
, while to_storage
refers to the designated storage type for storing this forward data.
Hence, on considering the Copy action associated with the output Action index 4
, we have:
n = 2
, which is stored in StorageType.DISK
, to working storage for useby the adjoint model.
Instead of copying the data, we can move the data from one storage type to another. To do so, checkpoint_schedules has a Move action used to indicate that the data, once moved, is no longer accessible in the original storage type. In SingleDiskStorageSchedule
, we can move the forward data by setting the optional move_data
parameter as True
.
cp_schedule = SingleDiskStorageSchedule(move_data=True)
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- ------------------------------------------------------ 0 ---▶---▶---▶---▶ Forward(0, sys.maxsize, False, True, StorageType.DISK) 1 End Forward EndForward() 2 Move(3, StorageType.DISK, StorageType.WORK) 3 . . . ◀--- Reverse(4, 3, True) 4 Move(2, StorageType.DISK, StorageType.WORK) 5 . . ◀--- Reverse(3, 2, True) 6 Move(1, StorageType.DISK, StorageType.WORK) 7 . ◀--- Reverse(2, 1, True) 8 Move(0, StorageType.DISK, StorageType.WORK) 9 ◀--- Reverse(1, 0, True) 10 End Reverse EndReverse()
The Move action follows a basic form:
Move(n, from_storage, to_storage)
which can be read as:
Move the data associated with step n
.
The terms from_storage
and to_storage
are the storage types from and to which the data should be moved, respectively.
Thus, the Move action associated with the output Action index: 4
reads:
n = 2
, which is stored in StorageType.DISK
, to working storage for useby the adjoint model.
Here, we start to present schedules obtained by checkpointing algorithms
The Revolve strategy, as introduced in reference [1], generates a schedule that only uses 'RAM'
storage type.
The Revolve
class gives a schedule according to two parameters: the total forward steps (max_n = 4
) and the number of checkpoints to store in 'RAM'
(snaps_in_ram = 2
).
from checkpoint_schedules import StorageType
snaps_in_ram = 2
solver_manager = CheckpointingManager(max_n) # manager object
cp_schedule = Revolve(max_n, snaps_in_ram)
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷ Forward(0, 2, True, False, StorageType.RAM) 1 . . *---▷ Forward(2, 3, True, False, StorageType.RAM) 2 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 3 End Forward EndForward() 4 . . . ◀--- Reverse(4, 3, True) 5 Move(2, StorageType.RAM, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Copy(0, StorageType.RAM, StorageType.WORK) 9 ---▷ Forward(0, 1, False, False, StorageType.WORK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ---▶ Forward(0, 1, False, True, StorageType.WORK) 14 ◀--- Reverse(1, 0, True) 15 End Reverse EndReverse()
The employment of the checkpointing strategies in the an adjoint-based gradient requires the forward solver recomputation. As demonstrated in the output above, we have the Forward action associated with the Action index: 0
that reads as follows:
- Advance from time step 0 to the start of the time step 2.
- Store the forward data required to restart the forward solver from time step 0.
- The storage of the forward restart data is done in RAM.
'*−−−▷−−−▷'
associated toForward(0, 2, True, False, <StorageType.RAM: 0>)
The symbolic illustration of the step advancing reads:
'*'
: Forward data for restarting the forward solver is stored in 'RAM'
.
'−−−▷'
: Forward data used for adjoint computation is not stored.
'−−−▶'
: Forward data used for adjoint computation is stored.
The schedule as depicted below, employes a MultiStage distribution of checkpoints between 'RAM'
and 'disk'
as described in [2]. This checkpointing approach allows only memory storage ('RAM'
) or 'disk'
storage, or a combination.
The following code use two types of storage, 'RAM'
and 'disk'
.
snaps_in_ram = 1 # maximum number of checkpoints stored in RAM
snaps_on_disk = 1 # maximum number of checkpoints stored in disk
cp_schedule = MultistageCheckpointSchedule(max_n, snaps_in_ram, snaps_on_disk)
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷ Forward(0, 2, True, False, StorageType.RAM) 1 . . +---▷ Forward(2, 3, True, False, StorageType.DISK) 2 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 3 End Forward EndForward() 4 . . . ◀--- Reverse(4, 3, True) 5 Move(2, StorageType.DISK, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Copy(0, StorageType.RAM, StorageType.WORK) 9 ---▷ Forward(0, 1, False, False, StorageType.WORK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ---▶ Forward(0, 1, False, True, StorageType.WORK) 14 ◀--- Reverse(1, 0, True) 15 End Reverse EndReverse()
The symbol '+'
indicates that the forward data necessary for restarting the forward computation from step 0 is stored in 'disk'
.
The following code shows the execution of a forward step advancing using the Disk-Revolve schedule [3]. This schedule considers two type of storage: memory ('RAM'
) and 'disk'
.
The Disk-Revolve algorithm, available within the checkpoint_schedules, requires the definition of checkpoints stored in memory to be greater than 0 ('snap_in_ram > 0'
). Specifying the checkpoints stored on 'disk'
is not required, as the algorithm itself calculates this value.
The number of checkpoints stored in 'disk'
is determined according to the costs associated with advancing the backward and forward solvers in a single step and the costs of writing and reading the checkpoints saved on disk. Additional details of the Disk-Revolve algorithmic are avaible in the references [3], [4] and [5].
snaps_in_ram = 1 # number of checkpoints stored in RAM
cp_schedule = DiskRevolve(max_n, snapshots_in_ram=snaps_in_ram) # checkpointing schedule object
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷---▷ Forward(0, 3, True, False, StorageType.RAM) 1 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 2 End Forward EndForward() 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(0, StorageType.RAM, StorageType.WORK) 5 ---▷---▷ Forward(0, 2, False, False, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Copy(0, StorageType.RAM, StorageType.WORK) 9 ---▷ Forward(0, 1, False, False, StorageType.WORK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ---▶ Forward(0, 1, False, True, StorageType.WORK) 14 ◀--- Reverse(1, 0, True) 15 End Reverse EndReverse()
Periodic Disk Revolve is a two type hierarchical schedule [4]. This strategy requires the specification of the maximum number of steps (max_n
) and the number of checkpoints stored in memory (snaps_in_ram
) and computes automatically the number of checkpoint stored in disk.
Periodic Disk Revolve schedule is generated with PeriodicDiskRevolve
class. This schedule is contrained to 'snap_in_ram > 0'
.
snaps_in_ram = 1
cp_schedule = PeriodicDiskRevolve(max_n, snaps_in_ram)
solver_manager.execute(cp_schedule)
We use periods of size 3 Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷---▷ Forward(0, 3, True, False, StorageType.RAM) 1 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 2 End Forward EndForward() 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(0, StorageType.RAM, StorageType.WORK) 5 ---▷---▷ Forward(0, 2, False, False, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Copy(0, StorageType.RAM, StorageType.WORK) 9 ---▷ Forward(0, 1, False, False, StorageType.WORK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ---▶ Forward(0, 1, False, True, StorageType.WORK) 14 ◀--- Reverse(1, 0, True) 15 End Reverse EndReverse()
The following code illustrates the forward and adjoint computations using the checkpointing given by H-Revolve strategy [5]. This checkpointing schedule is generated with HRevolve
class, which requires the following parameters: maximum steps stored in RAM (snap_in_ram
), maximum steps stored on disk (snap_on_disk
), and the number of time steps (max_n
).
HRevolve
is constrained for the number of checkpoints in 'RAM'
to be greater than zero ('snap_in_ram > 0'
)
snaps_on_disk = 1
snaps_in_ram = 1
cp_schedule = HRevolve(max_n, snaps_in_ram, snaps_on_disk) # checkpointing schedule
solver_manager.execute(cp_schedule) # execute forward and adjoint in time with the schedule
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷---▷ Forward(0, 3, True, False, StorageType.RAM) 1 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 2 End Forward EndForward() 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(0, StorageType.RAM, StorageType.WORK) 5 ---▷---▷ Forward(0, 2, False, False, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Copy(0, StorageType.RAM, StorageType.WORK) 9 ---▷ Forward(0, 1, False, False, StorageType.WORK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ---▶ Forward(0, 1, False, True, StorageType.WORK) 14 ◀--- Reverse(1, 0, True) 15 End Reverse EndReverse()
The Mixed checkpointing strategy works under the assumption that the data required to restart the forward computation is of the same size as the data required to advance the adjoint model in one step. Further details into the Mixed checkpointing schedule was discussed in reference [6].
This specific schedule provides the flexibility to store the forward restart data either in 'RAM'
or on 'disk'
, but not both simultaneously within the same schedule.
snaps_on_disk = 1
max_n = 4
cp_schedule = MixedCheckpointSchedule(max_n, snaps_on_disk)
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 +---▷---▷---▷ Forward(0, 3, True, False, StorageType.DISK) 1 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 2 End Forward EndForward() 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(0, StorageType.DISK, StorageType.WORK) 5 ---▷---▷ Forward(0, 2, False, False, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Move(0, StorageType.DISK, StorageType.WORK) 9 ---▶ Forward(0, 1, False, True, StorageType.DISK) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.DISK, StorageType.WORK) 13 ◀--- Reverse(1, 0, True) 14 End Reverse EndReverse()
In the example mentioned earlier, the storage of the forward restart data is default configured for 'disk'
. To modify the storage type to 'RAM'
, the user can set the MixedCheckpointSchedule
argument storage = StorageType.RAM
, as displayed below.
snaps_in_ram = 1
cp_schedule = MixedCheckpointSchedule(max_n, snaps_on_disk, storage=StorageType.RAM)
solver_manager.execute(cp_schedule)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 *---▷---▷---▷ Forward(0, 3, True, False, StorageType.RAM) 1 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 2 End Forward EndForward() 3 . . . ◀--- Reverse(4, 3, True) 4 Copy(0, StorageType.RAM, StorageType.WORK) 5 ---▷---▷ Forward(0, 2, False, False, StorageType.WORK) 6 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 7 . . ◀--- Reverse(3, 2, True) 8 Move(0, StorageType.RAM, StorageType.WORK) 9 ---▶ Forward(0, 1, False, True, StorageType.RAM) 10 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 11 . ◀--- Reverse(2, 1, True) 12 Move(0, StorageType.RAM, StorageType.WORK) 13 ◀--- Reverse(1, 0, True) 14 End Reverse EndReverse()
Two-level binomial schedule was presented in reference [6], and its application was performed in the work [7].
The two-level binomial checkpointing stores the forward restart data based on the user-defined period
. In this schedule, we can define the limit for additional storage of the forward restart data during the step advancing of the adjoint model. The default storage type is 'disk'
.
The two-level binomial schedule is provided by TwoLevelCheckpointSchedule
. To obtain this schedule we need the period period = 2
and the extra forward restart data storage add_snaps = 1
.
add_snaps = 1 # additional storage of the forward restart data
period = 3
revolver = TwoLevelCheckpointSchedule(period, add_snaps)
solver_manager.execute(revolver)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 +---▷---▷---▷ Forward(0, 3, True, False, StorageType.DISK) 1 . . . +---▷ Forward(3, 6, True, False, StorageType.DISK) 2 End Forward EndForward() 3 Copy(3, StorageType.DISK, StorageType.WORK) 4 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 5 . . . ◀--- Reverse(4, 3, True) 6 Copy(0, StorageType.DISK, StorageType.WORK) 7 ---▷ Forward(0, 1, False, False, StorageType.WORK) 8 . +---▷ Forward(1, 2, True, False, StorageType.DISK) 9 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 10 . . ◀--- Reverse(3, 2, True) 11 Move(1, StorageType.DISK, StorageType.WORK) 12 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 13 . ◀--- Reverse(2, 1, True) 14 Copy(0, StorageType.DISK, StorageType.WORK) 15 ---▶ Forward(0, 1, False, True, StorageType.WORK) 16 ◀--- Reverse(1, 0, True) 17 End Reverse EndReverse()
The output above shows the forward and adjoint executions using the two-level binomial checkpointing. Notice that the action associated with Action index: 8
shows that the additional forward restart data storage is stored on 'disk'
.
We can also store the additional forward restart checkpointing in 'RAM'
by setting the optional argument binomial_storage = StorageType.RAM
. The output below displays the action associated with Action index: 8
showing that the forward restart data storage is in 'RAM'
.
revolver = TwoLevelCheckpointSchedule(period, binomial_snapshots=snaps_on_disk,
binomial_storage=StorageType.RAM)
solver_manager.execute(revolver)
Action index: Run-time illustration Action: --------------- ----------------------- --------------------------------------------- 0 +---▷---▷---▷ Forward(0, 3, True, False, StorageType.DISK) 1 . . . +---▷ Forward(3, 6, True, False, StorageType.DISK) 2 End Forward EndForward() 3 Copy(3, StorageType.DISK, StorageType.WORK) 4 . . . ---▶ Forward(3, 4, False, True, StorageType.WORK) 5 . . . ◀--- Reverse(4, 3, True) 6 Copy(0, StorageType.DISK, StorageType.WORK) 7 ---▷ Forward(0, 1, False, False, StorageType.WORK) 8 . *---▷ Forward(1, 2, True, False, StorageType.RAM) 9 . . ---▶ Forward(2, 3, False, True, StorageType.WORK) 10 . . ◀--- Reverse(3, 2, True) 11 Move(1, StorageType.RAM, StorageType.WORK) 12 . ---▶ Forward(1, 2, False, True, StorageType.WORK) 13 . ◀--- Reverse(2, 1, True) 14 Copy(0, StorageType.DISK, StorageType.WORK) 15 ---▶ Forward(0, 1, False, True, StorageType.WORK) 16 ◀--- Reverse(1, 0, True) 17 End Reverse EndReverse()
This notebook focused on a visual illustration of the employment of the schedules available in the checkpointing_schedule package. The specific function (e.g. action_forward
) proposed to illustrate the step execution of the forward and adjoint models. However, the user can implement the necessary code to the step execution of the forward and adjoint models and the copy and move codes instead of the illustrative code.
[1] Griewank, A., & Walther, A. (2000). Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Transactions on Mathematical Software (TOMS), 26(1), 19-45., doi: https://doi.org/10.1145/347837.347846
[2] Stumm, P., & Walther, A. (2009). Multistage approaches for optimal offline checkpointing. SIAM Journal on Scientific Computing, 31(3), 1946-1967. https://doi.org/10.1137/080718036
[3] Aupy, G., Herrmann, J., Hovland, P., & Robert, Y. (2016). Optimal multistage algorithm for adjoint computation. SIAM Journal on Scientific Computing, 38(3), C232-C255. DOI: https://doi.org/10.1145/347837.347846.
[4] Aupy, G., & Herrmann, J. (2017). Periodicity in optimal hierarchical checkpointing schemes for adjoint computations. Optimization Methods and Software, 32(3), 594-624. doi: https://doi.org/10.1080/10556788.2016.1230612
[5] Herrmann, J. and Pallez (Aupy), G. (2020). H-Revolve: a framework for adjoint computation on synchronous hierarchical platforms. ACM Transactions on Mathematical Software (TOMS), 46(2), 1-25. DOI: https://doi.org/10.1145/3378672.
[6] Maddison, J. R. (2023). On the implementation of checkpointing with high-level algorithmic differentiation. arXiv preprint arXiv:2305.09568. https://doi.org/10.48550/arXiv.2305.09568.
[7] Pringle, G. C., Jones, D. C., Goswami, S., Narayanan, S. H. K., and Goldberg, D. (2016). Providing the ARCHER community with adjoint modelling tools for high-performance oceanographic and cryospheric computation. https://nora.nerc.ac.uk/id/eprint/516314.
[8] Goldberg, D. N., Smith, T. A., Narayanan, S. H., Heimbach, P., and Morlighem, M. (2020). Bathymetric Influences on Antarctic Ice‐Shelf Melt Rates. Journal of Geophysical Research: Oceans, 125(11), e2020JC016370. doi: https://doi.org/10.1029/2020JC016370.