# -*- coding: utf-8 -*-
# Copyright 2019 - 2024 United Kingdom Research and Innovation
# Copyright 2019 - 2024 Technical University of Denmark
# Copyright 2019 - 2022 The University of Manchester
# Copyright 2019 - 2022 The University of Bath
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Authored by: Claire Delplancke (University of Bath)
# Evangelos Papoutsellis (UKRI-STFC)
# Gemma Fardell (UKRI-STFC)
# Laura Murgatroyd (UKRI-STFC)
# Margaret Duff (UKRI-STFC)
# Hannah Robarts (UKRI-STFC)
In this demo, we learn how to use the Stochastic Primal-Dual Hybrid (SPDHG) algorithm introduced in Chambolle et al., see as well Ehrhardt et al., Papoutsellis et al., Gutierrez et al.. This algorithm is a stochastic version of the Primal-Dual Hybrid Gradient (PDHG) Algorithm by Chambolle & Pock. We focus on Tomography Reconstruction under an edge-preserving prior, i.e., the Total variation regularisation, see ROF.
In the previous demo, we presented PDHG for tomography reconstruction.
SPDHG is a stochastic version of PDHG. As PDHG, SPDHG can solve convex, non necessarily smooth optimization problems and is provenly convergent.
The central improvement with respect to PDHG is that SPDHG can deal with subsets of data, making it computationally more efficient, so that it can deal with larger datasets.
For CT and PET for example, the data can be divided into a collection of partial sinograms, each of them corresponding to a subset of the total set of angles. Accordingly, the projection operator $A: X \rightarrow Y$ can be decomposed as $A=(A_1, \dots, A_n): X \rightarrow Y=Y_1\times \dots \times Y_n$ where $n$ is the number of subsets and each $A_i:X \rightarrow Y_i$ is a partial projection.
At each iteration, SPDHG computes only one partial forward and back-projection $A_i,\,A_i^*$ for a randomly chosen index $i$, whereas PDHG computes $A$ and $A^*$. Hence SPDHG requires only a fraction of the computations of PDHG per iteration, which makes it faster per iteration.
Pre-requisites:
The minimisation problem for Total Variation Tomography reconstruction with TV is
$$ \underset{u}{\operatorname{argmin}} \mathcal{O}(u)= \frac{1}{2}\| A u - g\|^{2} + \alpha\,\mathrm{TV}(u) + \mathbb{I}_{\{u>0\}}(u) \tag{1}$$where,
$, $\quad$ is a positivity constraint for the minimiser $u$.
In order to use the SPDHG algorithm for the problem above, we need to express our minimisation problem into the following form:
$$\min_{u} \sum_{i=1}^n F_i(K_i u) + G(u) \tag{2}$$where we assume that:
$F_i$, $G$ are convex functionals
$F_i: Y_i \rightarrow \mathbb{R}$
$G: X \rightarrow \mathbb{R}$
$K_i$ is a continuous linear operator acting from a space X to another space Y_i :
with operator norm defined as $$\| K_i \| = \max\{ \|K_i x\|_{Y} : \|x\|_{X}\leq 1 \}.$$
Note: We use the Power Method to approximate the norm of the $K_i$'s and if needed of $\nabla$.
# Import libraries
from cil.optimisation.algorithms import PDHG, SPDHG
from cil.optimisation.operators import GradientOperator, BlockOperator
from cil.optimisation.functions import IndicatorBox, BlockFunction, L2NormSquared, MixedL21Norm
from cil.io import ZEISSDataReader
from cil.processors import Slicer, Binner, TransmissionAbsorptionConverter
from cil.plugins.astra.operators import ProjectionOperator
from cil.plugins.ccpi_regularisation.functions import FGP_TV
from cil.utilities.display import show2D
from cil.utilities.jupyter import islicer
import numpy as np
import matplotlib.pyplot as plt
import os
In this demo, we use the Walnut found in Jørgensen_et_all. In total, there are 6 individual micro Computed Tomography datasets in the native Zeiss TXRM/TXM format. The six datasets were acquired at the 3D Imaging Center at Technical University of Denmark in 2014 (HDTomo3D in 2016) as part of the ERC-funded project High-Definition Tomography (HDTomo) headed by Prof. Per Christian Hansen.
This example requires the dataset walnut.zip from https://zenodo.org/record/4822516 :
https://zenodo.org/record/4822516/files/walnut.zip
If running locally please download the data and update the filename
variable below.
filename = "/mnt/materials/SIRF/Fully3D/CIL/Walnut/valnut_2014-03-21_643_28/tomo-A/valnut_tomo-A.txrm"
reader = ZEISSDataReader()
reader.set_up(file_name=filename)
data3D = reader.read()
# reorder data to match default order for Astra/Tigre operator
data3D.reorder('astra')
# Get Image and Acquisition geometries
ag3D = data3D.geometry
ig3D = ag3D.get_ImageGeometry()
# Extract vertical slice
data2D = data3D.get_slice(vertical='centre')
# Select every 10 angles
sliced_data = Slicer(roi={'angle':(0,1601,10)})(data2D)
# Reduce background regions
binned_data = Binner(roi={'horizontal':(120,-120,2)})(sliced_data)
# Create absorption data
data = TransmissionAbsorptionConverter()(binned_data)
# Remove circular artifacts
data -= np.mean(data.as_array()[80:100,0:30])
# Get Image and Acquisition geometries for one slice
ag2D = data.geometry
ag2D.set_angles(ag2D.angles, initial_angle=0.2, angle_unit='radian')
ig2D = ag2D.get_ImageGeometry()
A = ProjectionOperator(ig2D, ag2D, device = "gpu")
In order to setup and run SPDHG, we need to define the following:
The setup and run of SPDHG:
spdhg = SPDHG(f = F, g = G, operator = K)
spdhg.run(number_of_iterations)
We can express our problem in the form, if we let
$K = (A_1,\dots,A_n) \quad \Longleftrightarrow \quad $ K = BlockOperator(A_i)
$F_i: Y_i \rightarrow \mathbb{R}, \text{ with } F_i(y_i) := \frac{1}{2}\| y_i - g_i \|^{2}, \quad \Longleftrightarrow \quad$ for all $i$'s, F_i = 0.5*L2NormSquared(data[i])
and F = BlockFunction(F_i)
$G: X \rightarrow \mathbb{R}, \text{ with } G(x) := \alpha\|\nabla x\|_{1} + \mathbb{I}_{\{x>0\}}(x), \quad \Longleftrightarrow \quad$ G = alpha * FGP_TV()
.
# Define number of subsets
n_subsets = 10
#Sequentially partition the data into the number of subsets using the data partitioner
partitioned_data=data.partition(n_subsets, 'sequential')
# Initialize the lists containing the F_i's
f_subsets = []
# Define F_i's
for i in range(n_subsets):
# Define F_i and put into list
fi = 0.5*L2NormSquared(b = partitioned_data[i])
f_subsets.append(fi)
# Define F
F = BlockFunction(*f_subsets)
#Create the projection operator, K, for the partitioned data
ageom_subset = partitioned_data.geometry
K = ProjectionOperator(ig2D, ageom_subset)
# Define G (set the constraint that the solution must be positive)
alpha = 0.025
G = alpha * FGP_TV(nonnegativity=True)
Note: in this example, there are some parameters which we don't pass, so that SPDHG will use default, pre-defined parameters. These parameters consist of:
# Setup and run SPDHG for 50 iterations
spdhg = SPDHG(f = F, g = G, operator = K,
update_objective_interval = 10)
spdhg.run(50)
spdhg_recon = spdhg.solution
show2D(spdhg_recon, cmap='inferno')
PDHG and SPDHG converge to the same set of solutions, but SPDHG is faster per iteration. How can we compare PDHG and SPDHG? We want to compare PDHG and SPDHG for the same amount of computational effort. Remember that at each iteration, PDHG computes one entire forward and back-projection, while SPDHG computes only a partial one. For example, if we have 10 subsets, $10$ iterations of SPDHG have the same computational cost than one iteration of PDHG.
In order to have a fair comparison between both algorithms, we then define the notion of epoch, where one epoch corresponds to a full forward and back-projection. Hence, one epoch of PDHG corresponds to one iteration, while one epoch of SPDHG corresponds in our case to $10$ iteration.
Let's set-up and run PDHG and SPDHG for the same number of epochs and compare the results.
n_epoch = 5
# Setup and run PDHG
normK = K.norm()
tau = 1/normK
sigma = 1/normK
pdhg = PDHG(f = F, g = G, operator = K,
update_objective_interval = 1, tau=tau, sigma=sigma)
pdhg.run(n_epoch)
pdhg_recon = pdhg.solution
# Setup and run SPDHG
spdhg = SPDHG(f = F, g = G, operator = K,
update_objective_interval = 1 * n_subsets)
spdhg.run(n_epoch * n_subsets)
spdhg_recon = spdhg.solution
show2D([pdhg_recon, spdhg_recon], cmap='inferno', title = ["PDHG, {} epochs ".format(n_epoch),"SPDHG, {} epochs ".format(n_epoch)])
After 5 epochs, SPDHG reconstruction seems visually more converged than PDHG reconstruction.
To further compare PDHG and SPDHG speeds, let's use a quantitative measure of convergence, which is given by the value of the objective function $\mathcal{O}(u)$.
nepoch = 20
pdhg.run(nepoch)
pdhg_measures = pdhg.objective
spdhg.run(nepoch*n_subsets)
spdhg_measures = spdhg.objective
# plot
epoch_range = np.arange(0,pdhg.iteration+pdhg.update_objective_interval, pdhg.update_objective_interval)
plt.semilogy(epoch_range, pdhg_measures, label='PDHG')
plt.semilogy(epoch_range, spdhg_measures, label='SPDHG')
plt.xlabel('Epochs')
plt.ylabel('Objective function')
plt.legend()
As for PDHG, there are various ways to put our reconstruction problem: $$ \underset{u}{\operatorname{argmin}} \mathcal{O}(u)= \frac{1}{2}\| A u - g\|^{2} + \alpha\,\mathrm{TV}(u) + \mathbb{I}_{\{u>0\}}(u) $$ into SPDHG-appropriate form: $$\min_{u} \sum_{i=1}^n F_i(K_i u) + G(u).$$
Implicit set-up
In the first part of the tutorial, we chose the so-called implicit set-up, where
$K = (A_1,\dots,A_n) \quad \Longleftrightarrow \quad $ K = BlockOperator(A_i)
$F_i: Y_i \rightarrow \mathbb{R}, \text{ with } F_i(y_i) := \frac{1}{2}\| y_i - g_i \|^{2}, \quad \Longleftrightarrow \quad$ for all $i$'s, F_i = 0.5*L2NormSquared(data[i])
and F = BlockFunction(F_i)
$G: X \rightarrow \mathbb{R}, \text{ with } G(x) := \alpha\|\nabla x\|_{1} + \mathbb{I}_{\{x>0\}}(x), \quad \Longleftrightarrow \quad$ G = alpha * FGP_TV()
.
Explicit set-up
As for PDHG, another way is to include the gradient of $\mathrm{TV}(u)=\alpha \|\nabla u\|_{2,1}$ into the operator $K$:
$K = (A_1,\dots,A_n, \nabla) \quad \Longleftrightarrow \quad $ K = BlockOperator(A_1,...,A_n, Grad)
For all $1\leq i \leq n, F_i: Y_i \rightarrow \mathbb{R}, \text{ with } F_i(y_i) := \frac{1}{2}\| y_i - g_i \|^{2}$ and $F_{n+1}(z)=\|z\|_{1,2}$, $\quad \Longleftrightarrow \quad$
for all $1\leq i \leq n$, F_i = 0.5*L2NormSquared(data[i])
, F_{n+1}=MixedL21Norm()
and F = BlockFunction(F_i)
G = IndicatorBox(lower=0)
.This time, we also use non-uniform sampling: half of the time, we update the dual variable corresponding to TV, so that $p_{n+1}=1/2$, and half of the time we update one of the dual variable corresponding to one subset, so that $p_i=1/(2n)$ for $1\leq i \leq n$.
Comment 1: if you don't want to use the default uniform sampling, the probabilities you specify need sum to $1$.
Comment 2: the setting and sampling have an influence on the value of one epoch in terms of iterations. With the sampling defined above, we use the partial forward operator only half of the time in mean, so that $1$ epoch (that is, the unit of one entire forward and back-projection) corresponds to twice the number of subsets.
# Define number of subsets
n_subsets = 10
#We sequentially partition the data into the number of subsets using the data partitioner
partitioned_data=data.partition(n_subsets, 'sequential')
# Initialize the lists containing the F_i's and A_i's
f_subsets = []
A_subsets = []
# Define F_i's and A_i's - data fit part
for i in range(n_subsets):
# Define F_i and put into list
fi = 0.5 * L2NormSquared(b = partitioned_data[i])
f_subsets.append(fi)
# Define A_i and put into list
ageom_subset = partitioned_data[i].geometry
Ai = ProjectionOperator(ig2D, ageom_subset)
A_subsets.append(Ai)
# Define F_{n+1} and A_{n+1} - regularization part
f_reg = ig2D.spacing[0] * alpha * MixedL21Norm() # take into account the pixel size with ig2D.spacing
Grad = GradientOperator(A_subsets[0].domain, backend='c', correlation='SpaceChannel')
# Define F and K
F = BlockFunction(*f_subsets, f_reg)
K = BlockOperator(*A_subsets, Grad)
# Define G
G = IndicatorBox(lower=0)
# Define probabilities
prob = [1 / (2 * n_subsets)] * n_subsets + [1 / 2]
# Setup and run SPDHG for 5 epochs
spdhg_explicit = SPDHG(f = F, g = G, operator = K,
update_objective_interval = 2 * n_subsets, prob=prob )
spdhg_explicit.run(5 * 2 * n_subsets)
# Show result and compare to implicit framework
spdhg_explicit_recon = spdhg_explicit.get_output()
show2D([spdhg_explicit_recon, spdhg_recon], cmap='inferno', title= ['Explicit 5 epochs', 'Implicit 5 epochs'])
# Comparison of objective function
epoch_range = np.arange(0,6)
plt.semilogy(epoch_range, spdhg_explicit.objective, label='explicit SPDHG')
plt.semilogy(epoch_range, spdhg_measures[0:len(epoch_range)], label='implicit SPDHG')
plt.xlabel('Epochs')
plt.ylabel('Objective function')
plt.legend()
You can observe that the explicit reconstruction converges slower than the implicit one. However, the block operator $K$ used in the explicit framework is now formed by partial operators $A_{\text{subsets}}[i]$ and $\nabla$ with potentially very different norms. Empirically it hence makes sense to renormalize the partial operators so that they all have norm one. To leave the objective function unchanged, we'd also need to redefine the function $f_i$'s according to the following equations: $$ \| K u - g\|_2^2 = \|K\|^2 \cdot \left\| \frac{K}{\|K\|} u - \frac{g}{\|K\|}\right\|_2^2$$ $$ \| \nabla u\|_{2,1} = \|\nabla \| \cdot \left\| \frac{\nabla}{\|\nabla\|} u\right\|_{2,1}$$
# Define number of subsets
n_subsets = 10
#We sequentially partition the data into the number of subsets using the data partitioner
partitioned_data=data.partition(n_subsets, 'sequential')
# Initialize the lists containing the F_i's and A_i's
f_subsets = []
A_subsets = []
# Define F_i's and A_i's - data fit part
for i in range(n_subsets):
ageom_subset = partitioned_data[i].geometry
Ai = ProjectionOperator(ig2D, ageom_subset)
normi = Ai.norm()
A_subsets.append((1 / normi) * Ai)
# Define rescaled F_i and put into list
fi = 0.5 * normi**2 * L2NormSquared(b = (1 / normi) * partitioned_data[i])
f_subsets.append(fi)
# Define F_{n+1} and A_{n+1} - regularization part
Grad = GradientOperator(A_subsets[0].domain, backend='c', correlation='SpaceChannel')
normGrad = Grad.norm()
Grad_renorm = (1 / normGrad) * Grad
f_reg = normGrad * ig2D.spacing[0] * alpha * MixedL21Norm() # take into account the pixel size with ig2D.spacing
# Define F and K
F = BlockFunction(*f_subsets, f_reg)
K = BlockOperator(*A_subsets, Grad_renorm)
# Setup and run SPDHG for 5 epochs
spdhg_explicit = SPDHG(f = F, g = G, operator = K,
update_objective_interval = 2 * n_subsets, prob=prob)
spdhg_explicit.run(5 * 2 * n_subsets)
# Show result and compare to implicit framework
spdhg_explicit_recon = spdhg_explicit.get_output()
show2D([spdhg_explicit_recon, spdhg_recon], cmap='inferno', title= ['Renormalized Explicit 5 epochs','Implicit 5 epochs'])
# Comparison of objective function
epoch_range = np.arange(0,6)
plt.semilogy(epoch_range, spdhg_explicit.objective, label='Renormalized Explicit SPDHG')
plt.semilogy(epoch_range, spdhg_measures[0:len(epoch_range)], label='Implicit SPDHG')
plt.xlabel('Epochs')
plt.ylabel('Objective function')
plt.legend()
Set-up: implicit framework is generally easier to set-up
Time-per-iteration: the time-per-iteration is greater in the implicit framework; indeed, at each iteration one needs to compute the proximal operator of the TV, which is done by in a (hidden) inner loop. This does not occur in the explicit framework, where all the operators have explicit expressions (hence its name.)
Speed of convergence: there is no theoretical result as to which framework is the fastest. In the example above, one can observe that renormalized explicit SPDHG seems to converge at roughly the same speed than implicit SPDHG.
Algorithm
SPDHG is described as follows:
Input: Step-sizes $\tau, (\sigma_i)_{1\leq i \leq n}$, $\tau$; probabilities $p_i$
Initialize: $x=0$, $y=0$, $z = \bar{z} = A^T y = 0$
Iterate for $m=1, \dots$:
Admissible step-sizes
SPDHG converges to the set of solutions of the saddle-point problem (in terms of Bregman distance [ref] and almost surely [ref1, ref2]) if the following conditions on step-sizes $\tau, (\sigma_i)_{1\leq i \leq n}$ is verified: $$ \tau \, \sigma_i \, \|K_i\|^2 < p_i^2, \quad 1 \leq i \leq n$$
CIL default implementation is $\sigma_i = \rho/\|K_i\|$ for all $i$'s and $\tau =\rho\, \min_i p_i^2/\|K_i\|$ with $\rho=0.99$.
Other admissible choices would be $\sigma_i = \gamma \rho/\|K_i\|$ for all $i$'s and $\tau =\rho\, \min_i p_i^2/\|K_i\| / \gamma$ with $\rho=0.99$. Changing the value of $\gamma$ allows to change the balance between primal and dual convergence. It can be changed in the following way:
spdhg = SPDHG(f = F, g = G, operator = K, gamma = gamma)