Wavelet Denoising

This numerical tour uses wavelets to perform non-linear image denoising. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$

Important: Please read the installation page for details about how to install the toolboxes.

In [2]:
options(warn=-1) # turns off warnings, to turn on: "options(warn=0)"


for (f in list.files(path="nt_toolbox/toolbox_general/", pattern="*.R")) {
    source(paste("nt_toolbox/toolbox_general/", f, sep=""))

for (f in list.files(path="nt_toolbox/toolbox_signal/", pattern="*.R")) {
    source(paste("nt_toolbox/toolbox_signal/", f, sep=""))

options(repr.plot.width=3.5, repr.plot.height=3.5)

Image Denoising

We consider a simple generative model of noisy images $F = f_0+W$ where $f_0 \in \RR^N$ is a deterministic image of $N$ pixels, and $W$ is a Gaussian white noise distributed according to $\Nn(0,\si^2 \text{Id}_N)$, where $\si^2$ is the variance of noise.

The goal of denoising is to define an estimator $\tilde F$ of $f_0$ that depends only on $F$, i.e. $\tilde F = \phi(F)$ where $\phi : \RR^N \rightarrow \RR^N$ is a potentially non-linear mapping.

Note that while $f_0$ is a deterministic image, both $F$ and $\tilde F$ are random variables (hence the capital letters).

The goal of denoising is to reduce as much as possible the denoising error given some prior knowledge on the (unknown) image $f_0$. A mathematical way to measure this error is to bound the quadratic risk $\EE_w(\norm{\tilde F - f_0}^2)$, where the expectation is computed with respect to the distribution of the noise $W$.

Image loading and adding Gaussian Noise For real life applications, one does not have access to the underlying image $f_0$. In this tour, we however assume that $f_0$ is known, and $f = f_0 + w\in \RR^N$ is generated using a single realization of the noise $w$ that is drawn from $W$. We define the estimated deterministic image as $\tilde f = \phi(f)$ which is a realization of the random vector $\tilde F$.

First we load an image $f_0 \in \RR^N$ where $N=n \times n$ is the number of pixels, and then display it.

In [2]:
n <- 256
name <- 'nt_toolbox/data/flowers.png'
f0 <- load_image(name, n)
imageplot(f0, 'Original image')

Standard deviation $\si$ of the noise.

In [3]:
sigma <- 0.08

Then we add Gaussian noise $w$ to obtain $f=f_0+w$.

In [4]:
f <- f0 + sigma*as.cimg(rnorm(n**2))

Display the noisy image.

In [5]:
imageplot(clamp(f), paste('Noisy, SNR=', snr(f0,f)) )

Hard Thresholding in Wavelet Bases

A simple but efficient non-linear denoising estimator is obtained by thresholding the coefficients of $f$ in a well chosen orthogonal basis $\Bb = \{\psi_m\}_m$ of $\RR^N$.

In the following, we will focuss on a wavelet basis, which is efficient to denoise piecewise regular images.

The hard thresholding operator with threshold $T \geq 0$ applied to some image $f$ is defined as $$ S_T^0(f) = \sum_{\abs{\dotp{f}{\psi_m}}>T} \dotp{f}{\psi_m} \psi_m = \sum_m s_T^0(\dotp{f}{\psi_m}) \psi_m $$ where the hard thresholding operator is $$ s_T^0(\alpha) = \choice{ \alpha \qifq \abs{\al}>T, \\ 0 \quad\text{otherwise}. }$ $$

The denoising estimator is then defined as $$ \tilde f = S_T^0(f). $$

In [6]:
options(repr.plot.width=4, repr.plot.height=4)

thresh_hard <- function(u,t) { return(u*(abs(u)>t)) }
alpha <- seq(-3,3, length=1000)
plot(alpha, sapply(alpha, thresh_hard, t=1), 'l')

Parameters for the orthogonal wavelet transform.

In [7]:
h <- c(0, .482962913145, .836516303738, .224143868042, -.129409522551)
h <- h/norm(h)
Jmin <- 2

First we compute the wavelet coefficients $a$
of the noisy image $f$.

In [8]:
a <- perform_wavortho_transf(f,Jmin,+1,h)

Display the noisy coefficients.

In [9]:

Select the threshold value, that should be proportional to the noise level $\si$.

In [10]:
T <- 3*sigma

Hard threshold the coefficients below the noise level to obtain $a_T(m)=s_T^0(a_m)$.

In [11]:
aT <- thresh_hard(a,T)

Display the thresholded coefficients.

In [12]:

Reconstruct the image $\tilde f$ from these noisy coefficients.

In [13]:
fHard <- perform_wavortho_transf(aT,Jmin,-1,h)

Display the denoising result.

In [14]:
imageplot(clamp(fHard), paste('Hard, SNR=', snr(f0,fHard)) )

Wavelet Denoising with Soft Thesholding

The estimated image $\tilde f$ using hard thresholding. suffers from many artifacts. It is possible to improve the result by using soft thresholding, defined as $$ \tilde f = S_T^1(f) = \sum_m s_T^1(\dotp{f}{\psi_m}) \psi_m $$ $$ \qwhereq s_T^1(\alpha) = \max\pa{0, 1 - \frac{T}{\abs{\alpha}}}\alpha. $$

Display the soft thresholding function $s_T^1(\al)$.

In [15]:
thresh_soft <- function(u,t){ return(max(1-t/abs(u), 0)*u) }

alpha <- seq(-3,3, length=1000)
plot(alpha, sapply(alpha, thresh_soft, t=1), 'l')

Select the threshold.

In [16]:
T <- 3/2*sigma

Perform the soft thresholding.

In [17]:
aT <- matrix(sapply(a, thresh_soft, t=T), c(n,n))

To slightly improve the soft thresholding performance, we do not threshold the coefficients corresponding to coarse scale wavelets.

In [18]:
aT[seq(1,n,2^Jmin),seq(1,n,2^Jmin)] <- a[seq(1,n,2^Jmin),seq(1,n,2^Jmin)]

Re-construct the soft thresholding estimator $\tilde f$.

In [19]:
fSoft <- perform_wavortho_transf(aT,Jmin,-1,h)

Display the soft thresholding denoising result.

In [20]:
imageplot(clamp(fSoft), paste('Soft, SNR=', snr(f0,fSoft)) )

One can prove that if the non-linear approximation error $ \norm{f_0-S_T(f_0)}^2 $ decays fast toward zero when $T$ decreases, then the quadratic risk $ \EE_w( \norm{f-S_T(f)}^2 ) $ also decays fast to zero when $\si$ decays. For this result to hold, it is required to select the threshold value according to the universal threshold rule $$ T = \si \sqrt{2\log(N)}. $$

Determine the best threshold $T$ for both hard and soft thresholding. Test several (T) values in $[.8\sigma, 4.5\sigma[$, and display the empirical SNR $-10\log_{10}(\norm{f_0-\tilde f}/\norm{f_0})$ What can you conclude from these results ? Test with another image.

In [21]:
options(repr.plot.width=6, repr.plot.height=4)


Translation Invariant Denoising with Cycle Spinning

Orthogonal wavelet transforms are not translation invariant. It means that the processing of an image and of a translated version of the image give different results.

Any denoiser can be turned into a translation invariant denoiser by performing a cycle spinning. The denoiser is applied to several shifted copies of the image, then the resulting denoised image are shifted back to the original position, and the results are averaged.

This corresponds to defining the estimator as $$ \tilde f = \frac{1}{M} \sum_{i=1}^{M}$ T_{-\de_i} \circ S_T(f) \circ T_{\de_i}$$ where $S_T$ is either the hard or soft thresholding, and $T_\de(f)(x) = f(x-\de)$ is the translation operator, using periodic boundary conditions. Here $(\de_i)_i$ is a set of $M$ discrete translation. Perfect invariance is obtained by using all possible $N$ translatation, but usually a small number $M \ll N$ of translation is used to obtain approximate invariance.

Number $m$ of translations along each direction so that $M = m^2$.

In [22]:
m <- 4

Generate a set of shifts $(\de_i)_i$.

In [23]:
dY <- meshgrid_2d(1:m, 1:m)$X ; dX <- meshgrid_2d(1:m, 1:m)$Y
delta <- cbind( array(dY, c(m*m,1)), array(dX, c(m*m,1)) )

To avoid storing all the translates in memory, one can perform a progressive averaging of the translates by defining $f^{(0)}=0$ and then $$ \forall\, i=1,\ldots,M, \quad f^{(i)} = \pa{1-\frac{i}{n}} f^{(i-1)} + \frac{i}{n} T_{-\de_i} \circ S_T(f) \circ T_{\de_i} $$ One then has $\tilde f = f^{(M)} $ after $M$ steps.

Initialize the denoised image $f^{(0)}=0$.

In [24]:
fTI <- array(0, c(n,n))
T <- 3*sigma

for (i in 1:(m*m)){
    fS <- circshift(f, delta[i,])
    a <- perform_wavortho_transf(fS,Jmin,1,h)
    aT <- array(sapply(a, thresh_hard, t=T), c(n,n))
    fS <- perform_wavortho_transf(aT,Jmin,-1,h)
    fS <- circshift(fS,-delta[i,])
    fTI <- (i-1)/i*fTI + 1/i*fS
In [25]:
imageplot(clamp(fTI), paste('TI, SNR=', snr(f0,fTI)) )


Exercise 1: Study the influence of the number $m$ of shift on the denoising quality.

Exercise 2: Select the best threshold $T$ for the translation invariant denoising.

Exercise 3: Implement a block-thresholding method, that threshold coefficient in square blocks according to the energy of each of these blocks.