Notebook

In [ ]:

# -*- coding: utf-8 -*-
#  Copyright 2019 - 2022 United Kingdom Research and Innovation
#  Copyright 2019 - 2022 The University of Manchester
#  Copyright 2019 - 2022 Technical University of Denmark
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#
#   Authored by:    Jakob S. Jørgensen (DTU)
#                   Gemma Fardell (UKRI-STFC)     
#                   Laura Murgatroyd (UKRI-STFC)     

Demo of optimisation-based reconstruction in CIL¶

The case is a 3D parallel-beam synchrotron dataset of a steel wire.¶

This exercise introduces you to regularised reconstructions. By using prior knowledge of the sample we can choose the most suitable regulariser for the problem. As we introduce different priors we need to use different algorithms to solve the optimisation problem.

Learning objectives are:

Load a dataset and reconstruct with FBP
Set-up a least-squares problem to solve using CIL's algorithms, a projection operator and objective function
Add regularisation to the least-squares problem and compare the results: Tikhonov, Non-negativity, L1-Norm, Total-Variation
Solve the optimisation problem with the appropriate algorithm: Gradient Descent, FISTA, PDHG

First we import everything we need:

In [ ]:

# Import algorithms, operators and functions from CIL optimisation module
from cil.optimisation.algorithms import GD, FISTA, PDHG
from cil.optimisation.operators import BlockOperator, GradientOperator,\
                                       GradientOperator
from cil.optimisation.functions import IndicatorBox, MixedL21Norm, L2NormSquared, \
                                       BlockFunction, L1Norm, LeastSquares, \
                                       OperatorCompositionFunction, TotalVariation, \
                                       ZeroFunction

# Import CIL Processors for preprocessing
from cil.processors import CentreOfRotationCorrector, Slicer, TransmissionAbsorptionConverter

# Import CIL display function
from cil.utilities.display import show2D

# Import from CIL ASTRA plugin
from cil.plugins.astra import ProjectionOperator

# Import FBP from CIL recon class
from cil.recon import FBP

#Import Total Variation from the regularisation toolkit plugin
from cil.plugins.ccpi_regularisation.functions import FGP_TV

# All external imports
import matplotlib.pyplot as plt

Load, pre-process and reconstruct the data using FBP¶

Exactly as in the notebook 1_Introduction/03_preprocessing, we load the steel-wire demonstration data provided as part of CIL, carry out some preprocessing and FBP reconstructions for reference:

In [ ]:

# Load the example data set
from cil.utilities.dataexample import SYNCHROTRON_PARALLEL_BEAM_DATA
data_sync = SYNCHROTRON_PARALLEL_BEAM_DATA.get()

# Preprocessing
scale = data_sync.get_slice(vertical=20).mean()
data_sync = data_sync/scale
data_sync = TransmissionAbsorptionConverter()(data_sync)
data_sync = CentreOfRotationCorrector.xcorrelation(slice_index='centre')(data_sync)

# Crop data and reorder for ASTRA backend
data90 = Slicer(roi={'angle':(0,90), 
                     'horizontal':(20,140,1)})(data_sync)
data90.reorder(order='astra')

# Set up and run FBP for 90-angle dataset
recon90 = FBP(data90, backend='astra').run(verbose=0)

# Set up and run FBP for 15-angle dataset
data15 = Slicer(roi={'angle': (0,90,6)})(data90)
recon15 = FBP(data15, backend='astra').run(verbose=0)

In [ ]:

# Define custom parameters for show2D for visualizing all reconstructions consistently
sx = 44
sz = 103
ca1 = -0.01
ca2 =  0.11
slices = [('horizontal_x',sx),('vertical',sz)]

Show a slice of the 90-degree FBP reconstruction:

In [ ]:

show2D(recon90, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

And the 15-projection FBP reconstruction, which contains severe streak artifacts:

In [ ]:

show2D(recon15, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

Use Gradient Descent to solve a Least Squares problem¶

Optimisation-based reconstruction is based on a fully discretized model that is conventionally assumed to be linear: $$Au = b$$ where $A$ is the linear operator known as the system matrix representing forward projection of an image to its sinogram, $b$ is the sinogram data, and $u$ is the unknown image to be reconstructed.

The first thing we are going to need is the LinearOperator representing forward and back-projections. We set up the ProjectionOperator from the ASTRA plugin by passing the 15-projection acquisition geometry, and image geometry:

In [ ]:

ag = data15.geometry
ig = ag.get_ImageGeometry()
A = ProjectionOperator(ig, ag, device="gpu")

We choose to work with the 15-projection dataset here and refer to it by b for convenience:

In [ ]:

b = data15

Unfortunately, as we normally have noise, model errors and other inconsistencies in the data, we cannot expect a solution exists to $Au = b$. We therefore relax the problem and aim to find a solution that is as close as possible to fitting the data. This is conventionally measured in a least-squares sense in that we solve the least-squares problem $$ \min_u \|Au - b \|_2^2 $$ where $$\|y \|_2^2 = \sum_i y_i^2.$$ The function that is to be minimized is called the objective function.

The reconstruction is the image $u$ that is the solution to the optimisation problem, i.e., that results in the lowest possible value of the objective function, in this case of the (squared) residual norm $\|Au - b \|_2^2$. In order to find the solution we use an iterative optimisation algorithm. Many exist, perhaps the most basic one is the gradient descent method, which is available in CIL as the GD algorithm. To set it up we need to specify the objective function in terms of a CIL Function. For LeastSquares this is simply:

In [ ]:

f1 = LeastSquares(A, b)

In iterative algorithms we must provide an initial point from which to start, here we choose the zero image.

In [ ]:

x0 = ig.allocate(0.0)

f1 is a CIL Function and CIL Functions can be evaluated for particular input images, for example we can evaluate it (which is the starting objective value of the optimisation problem) for x0:

In [ ]:

f1(x0)

CIL Functions provide a number of methods that are used by optimisation algorithms, most notably, if a function is smooth (continuously differentiable), then a CIL Function provides the gradient method using which the gradient of the function can be evaluated at a particular input image. For example we can evaluate the gradient at x0 and since it contains an element for each voxel, we can display it as an image:

In [ ]:

show2D(f1.gradient(x0),slice_list=slices,origin='upper-left')

To set up the gradient descent algorithm, we specify:

initial - the initial point
objective_function - the objective function
step_size - whether to use a fixed step size or a back-tracking line search (None)
max_iteration - the maximal number of iterations to run
update_objective_interval - how often to evaluate the objective function value

In [ ]:

myGD_LS = GD(initial=x0, 
             objective_function=f1, 
             step_size=None, 
             max_iteration=1000, 
             update_objective_interval=10)

Once the algorithm is set up, we can run it for a specified number of iterations and here using verbose=1 to print out progress information:

In [ ]:

myGD_LS.run(300, verbose=1)

Once done, the latest image/solution in the algorithm can be shown as an image:

In [ ]:

show2D(myGD_LS.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

Add an L2-Norm penalty to the solution¶

This was a basic least-squares example. We can specify more involved optimisation problems by combining multiple CIL Functions through addition, scalar multiplication as well as composition with CIL operators. For example, as an alternative to using CGLS to solve the Tikhonov problem with gradient operator D, i.e.,
$$\min_u \|Au-b\|_2^2 + \alpha^2\|Du\|_2^2$$

Tikhonov regularisation is more explicit in that a regularisation term is added to the least squares fitting term, specifically a squared 2-norm. This is covered in detail by the next notebook 02_tikhonov_block_framework.ipynb

We can set this objective function up step by step. First, we set again the least-squares data fitting term as before:

In [ ]:

f1 = LeastSquares(A, b)

Next we specify the operator D in the regularisation term and the value of the regularisation parameter alpha:

In [ ]:

D = GradientOperator(ig)
alpha = 10.0

We can construct the regularisation term by composing the squared L2-norm with the operator D as:

In [ ]:

f2 = OperatorCompositionFunction(L2NormSquared(),D)

Finally we form the full optimisation problem from the components defined:

In [ ]:

f = f1 + (alpha**2)*f2

As before we can set up a gradient descent algorithm to solve this problem:

In [ ]:

myGD_tikh = GD(initial=x0, 
               objective_function=f, 
               step_size=None, 
               max_iteration=1000, 
               update_objective_interval = 10)

In [ ]:

myGD_tikh.run(200, verbose=1)

In [ ]:

show2D(myGD_tikh.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

In the next notebook we solve the Tikhonov problem using CGLS. As an exercise you can compare the result and performance of the two algorithms.

Use FISTA to solve a Least Squares problem with a non-negativity constraint¶

Many useful reconstruction methods involve minimisation of functions that are NOT smooth and in those cases we need dedicated optimisation algorithms for non-smooth problems. In this notebook we consider optimisation problems that can be written in the form $$\min_u f(u) + g(u) $$ where $f$ is a smooth function as before, but $g$ may now be non-smooth. $g$ further needs to be "simple", in a certain sense, namely it should have a proximal mapping that is easy to evaluate. Proximal mapping can be thought of a generalisation of the gradient for a non-smooth function.

For this problem class the algorithm FISTA (Fast iterative shrinkage thresholding algorithm) can be employed. It is also known as the accelerated proximal gradient method.

We consider a couple of examples for different functions $g$. First we consider again least-squares but with a non-negativity constraint on all pixels. This problem can be written $$\min_u \|Au-b\|_2^2 + I_{C}(u) $$ where $I_{C}(u)$ is a special convex function known as an indicator function, which takes on the value 0 in its convex domain C (which we here take to be the set of images with only nonnegative pixel values), and the (extended) value of $\infty$ outside its domain. This can be specified in CIL using an IndicatorBox function:

In [ ]:

F = LeastSquares(A, b)
G = IndicatorBox(lower=0.0)

A FISTA algorithm instance can be set up similarly to the GD algorithm but specifying the $f$ and $g$ functions separately:

In [ ]:

myFISTANN = FISTA(f=F, 
                  g=G, 
                  initial=x0, 
                  max_iteration=1000,
                  update_objective_interval = 10)

We run it and display the resulting solution:

In [ ]:

myFISTANN.run(300, verbose=1)

In [ ]:

show2D(myFISTANN.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

We see that the non-negativity constraint, as expected, prevents any negative values. Furthermore, this has a positive effect of removing some of the streak artifacts in the background.

Use FISTA to solve a Least Squares problem with an L1-norm penalty¶

Another possibility is sparsity regularisation which we can achieve by choosing $g$ as the L1-norm multiplied by a regularisation parameter $\alpha$ to balance the strength of fitting to the data and enforcing regularisation: $$g(u) = \alpha\|u\|_1 = \alpha\sum_u |u_i| $$

Exercise 1

We will set up the fista algorithm as we did before. But replace the IndicatorBox() with alpha * L1Norm()

In [ ]:

alpha = 30
myFISTAL1 = FISTA(...)

Uncomment and run the cells to see the solution:

In [ ]:

# %load 'snippets/01_ex1_a.py'

Now run 300 iterations of myFISTAL1

In [ ]:

...

In [ ]:

# %load 'snippets/01_ex1_b.py'

We show the solution of L1 regularised least-squares produced by FISTA:

In [ ]:

show2D(myFISTAL1.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

Here, all small values of the background, and the lowest-density parts of the sample, have been forced to be zero by the sparsity regularisation term, keeping only the pixel values of the largest magnitude. Sparsity regularisation does not directly enforce smoothing, which is seen in the image by neighbouring pixel values being rather different in the part of the image that is not zero.

Use FISTA to solve a Least Squares problem with Total Variation regularisation¶

Sometimes a better option is to enforce sparsity of the image gradient. This is known as Total Variation (TV) regularisation and tends to enforce piecewise constant areas separated by sharp edges. Recall that for example Tikhonov regularisation may reduce noise but tends to blur edges, so TV may have an advantage if the image to be reconstructed is known to consist of relatively homogeneous areas separated by sharp edges. In CIL, TV is available as the TotalVariation function and from the CCPi regularisation toolkit as FGP_TV which can be run on the GPU. We can set up and solve the TV-regularised problem in the same way as before:

Exercise 2

Change the strength of the regularisation and see what effect high and very low values of alpha have on the reconstruction. Try to find a value that smooths the streaks but preserves the features.

In [ ]:

alpha = ...

In [ ]:

# %load snippets/01_ex2.py

In [ ]:

GTV = alpha*FGP_TV(device='gpu') 
myFISTATV = FISTA(f=F, 
                  g=GTV, 
                  initial=x0 ,
                  max_iteration=1000, 
                  update_objective_interval = 10)

Show the slices of the TV reconstruction by FISTA:

Note that the proximal mapping of Total Variation is not simple but needs to be evaluated numerically, but this is handled by the TotalVariation and FGP_TV functions, however it does take a while to run which is why we use the GPU implementation on real data (approximately 3 minutes):

In [ ]:

myFISTATV.run(50,verbose=1)

In [ ]:

show2D(myFISTATV.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

We can see that TV-regularisation successfully compensates for the streak artifacts caused by few projections, suppresses noise and preserves sharp edges in the reconstruction.

In [ ]:

if myFISTATV.g.alpha == 0.02 and myFISTATV.iteration > 199:
    print("Good job, carry on!")
else:
    raise ValueError("Try again: run 200 iterations with alpha of 0.02")

Use PDHG to solve a Least Squares problem with Total Variation regularisation¶

An even more flexible algorithm for non-smooth problems is the Primal Dual Hybrid Gradient (PDHG) algorithm, which also goes under other names such as the Chambolle-Pock algorithm. In PDHG we can split complicated functionals into simpler parts for which the proximal mapping can be evaluated. PDHG will be covered in more detail in a separate notebook 03_PDHG.ipynb, here it is demonstrated how to set up the same TV-regularised problem we just solved with FISTA. Note how BlockFunctions and BlockOperators are used to specify multiple terms/operators:

In [ ]:

alpha = 0.02
F = BlockFunction(L2NormSquared(b=b), 
                  alpha*MixedL21Norm())
K = BlockOperator(A, 
                  D)
G = ZeroFunction()
myPDHG = PDHG(f=F, 
              g=G, 
              operator=K, 
              max_iteration=10000, 
              update_objective_interval = 10)

Run the algorithm for a specified number of iterations with increased verbosity/amount of printing to screen.

Here we initially run for 500 iterations.

In [ ]:

myPDHG.run(500,verbose=2)

Show the TV-regularised solution obtained by the PDHG Algorithm and note it is identical with the one from FISTA:

In [ ]:

show2D(myPDHG.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

Convergence of the algorithms¶

CIL Algorithms can record history of objective values (primal and dual for PDHG) for monitoring convergence:

In [ ]:

plt.figure()
plt.loglog(myFISTATV.iterations[1:], myFISTATV.objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.dual_objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.primal_dual_gap[1:])
plt.ylim((1e0,1e5))
plt.legend(['FISTA','PDHG primal','PDHG dual','PDHG gap'])
plt.grid()
plt.xlabel('Number of iterations')
plt.ylabel('Objective value')

We observe that the two algorithms achieve approximately the same (primal) objective value. PDHG in addition supplies the dual objective, and the primal-dual gap (difference of primal of dual objectives) which helps for monitoring convergence. This primal-dual gap tends to zero as the algorithm approaches the solution.

Although in practice fewer iterations are often sufficient for a visually converged image to see a very well converged primal-dual gap we want to run for more iterations. It will take approximately 5 minutes to run an additional 4500 iterations.

In [ ]:

myPDHG.update_objective_interval = 100
myPDHG.run(4500,verbose=2)

In [ ]:

show2D(myPDHG.solution, 
     slice_list=slices, cmap='inferno', fix_range=(ca1,ca2), origin='upper-left')

In [ ]:

plt.figure()
plt.loglog(myFISTATV.iterations[1:], myFISTATV.objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.dual_objective[1:])
plt.loglog(myPDHG.iterations[1:], myPDHG.primal_dual_gap[1:])
plt.ylim((1e0,1e5))
plt.legend(['FISTA','PDHG primal','PDHG dual','PDHG gap'])
plt.grid()
plt.xlabel('Number of iterations')
plt.ylabel('Objective value')