Notebook

Exploring SAR Data and SAR Time Series Analysis with Supplied Data¶

Franz J Meyer; University of Alaska Fairbanks & Josef Kellndorfer, Earth Big Data, LLC ¶

This notebook introduces you to the analysis of deep multi-temporal SAR image data stacks in the framework of Jupyter Notebooks. The Jupyter Notebook environment is easy to launch in any web browser for interactive data exploration with provided or new training data. Notebooks are comprised of text written in a combination of executable python code and markdown formatting including latex style mathematical equations. Another advantage of Jupyter Notebooks is that they can easily be expanded, changed, and shared with new data sets or newly available time series steps. Therefore, they provide an excellent basis for collaborative and repeatable data analysis.

This notebook covers the following data analysis concepts:

How to load time series stacks into Jupyter Notebooks and how to explore image content using basic functions such as mean value calculation and histogram analysis.
How to apply calibration constants to covert initial digital number (DN) data into calibrated radar cross section information.
How to subset images create time series information of calibrated SAR amplitude values.
How to explore the time-series information in SAR data stacks for environmental analysis.

Important Notes about JupyterHub

Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.

In [ ]:

import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)

In [ ]:

from IPython.display import Markdown
from IPython.display import display

notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
    env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
    display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
    display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
    display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
    display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
    display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
    display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))

0. Importing Relevant Python Packages¶

In this notebook we will use the following scientific libraries:

Pandas is a Python library that provides high-level data structures and a vast variety of tools for analysis. The great feature of this package is the ability to translate rather complex operations with data into one or two commands. Pandas contains many built-in methods for filtering and combining data, as well as the time-series functionality.
GDAL is a software library for reading and writing raster and vector geospatial data formats. It includes a collection of programs tailored for geospatial data processing. Most modern GIS systems (such as ArcGIS or QGIS) use GDAL in the background.
NumPy is one of the principal packages for scientific applications of Python. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.
Matplotlib is a low-level library for creating two-dimensional diagrams and graphs. With its help, you can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs. Moreover, many popular plotting libraries are designed to work in conjunction with matplotlib.

In [ ]:

%%capture
from pathlib import Path

import pandas as pd # for DatetimeIndex
from osgeo import gdal # for GetRasterBand, Open, ReadAsArray
gdal.UseExceptions()
import numpy as np #for log10, mean, percentile, power

%matplotlib inline
import matplotlib.pylab as plb # for add_patch, add_subplot, figure, hist, imshow, set_title, xaxis,_label, text 
import matplotlib.pyplot as plt # for add_subplot, axis, figure, imshow, legend, plot, set_axis_off, set_data,
                                # set_title, set_xlabel, set_ylabel, set_ylim, subplots, title, twinx
import matplotlib.patches as patches  # for Rectangle
import matplotlib.animation as an # for FuncAnimation
from matplotlib import rc 

import opensarlab_lib as asfn
asfn.jupytertheme_matplotlib_format()

from IPython.display import HTML

1. Load Data Stack¶

This notebook will be using a 70-image deep L-band SAR data stack over Nepal for a first experience with time series processing. The L-band data were acquired by the ALOS PALSAR sensor and are available to us through the services of the Alaska Satellite Facility.

Nepal is an interesting site for this analysis due to the significant seasonality of precipitation that is characteristic for this region. Nepal is said to have five seasons: spring, summer, monsoon, autumn and winter. Precipitation is low in the winter (November - March) and peaks dramatically in the summer, with top rain rates in July, August, and September (see figure to the right). As SAR is sensitive to changes in soil moisture, these weather patterns have a noticeable impact on the Radar Cross Section ($\sigma$) time series information.

We will analyze the variation of $\sigma$ values over time and will interpret them in the context of rainfall rates in the imaged area.

Before we get started, let's first create a working directory for this analysis and change into it:

In [ ]:

path = Path("/home/jovyan/notebooks/SAR_Training/English/Master/data_time_series_example")

if not path.exists():
    path.mkdir()

We will retrieve the relevant data from an Amazon Web Service (AWS) cloud storage bucket using the following command:

In [ ]:

s3_path = 's3://asf-jupyter-data-west/time_series.zip'
time_series_path = path/Path(s3_path).name

!aws --region=us-west-2 --no-sign-request s3 cp $s3_path $time_series_path

Now, let's unzip the file (overwriting previous extractions) and clean up after ourselves:

In [ ]:

if time_series_path.exists():
    asfn.asf_unzip(str(path), str(time_series_path))
    time_series_path.unlink()

The following lines set path variables needed for data processing. This step is not necessary but it saves a lot of extra typing later. Define variables for the main data directory as well as for the files containing data and image information:

In [ ]:

datadirectory = path/'time_series/S32644X696260Y3052060sS1-EBD'
datefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.dates'
imagefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.vrt'
imagefile_cross = datadirectory/'S32644X696260Y3052060sS1_D_vh_0092_mtfil.vrt'

2. Switch to the Data Directory:¶

In [ ]:

#!ls *.vrt #Uncomment this line to see a List of the files

3. Assess Image Acquisition Dates¶

Before we start analyzing the available image data, we want to examine the content of our data stack. First, we read the image acquisition dates for all files in the time series and create a pandas date index.

In [ ]:

if datefile.exists():
    with open(str(datefile), 'r') as f:
        dates = f.readlines()
        tindex = pd.DatetimeIndex(dates)

From the date index, we make and print a lookup table for band numbers and dates:

In [ ]:

if imagefile.exists():
    j = 1
    print('Bands and dates for', imagefile)
    for i in tindex:
        print("{:4d} {}".format(j, i.date()),end=' ')
        j += 1
        if j%5 == 1: print()

4. Explore the Available Image Data¶

To open an image file using the gdal.Open() function. This returns a variable (img) that can be used for further interactions with the file:

In [ ]:

if imagefile.exists():
    img = gdal.Open(str(imagefile))

To explore the image (number of bands, pixels, lines), you can use several functions associated with the image object (img) created in the last code cell:

In [ ]:

print(img.RasterCount) # Number of Bands
print(img.RasterXSize) # Number of Pixels
print(img.RasterYSize) # Number of Lines

4.1 Reading Data from an Image Band¶

To access any band in the image, use GDAL's GetRasterBand(x) function. Replace the band_num value with the number of the band you wish to access.

In [ ]:

band_num = 70 
band = img.GetRasterBand(band_num)

Once a band is seleted, several functions associated with the band are available for further processing, e.g., band.ReadAsArray(xoff=0,yoff=0,xsize=None,ysize=None)

Let's read the entire raster layer for the band:

In [ ]:

raster = band.ReadAsArray()

4.2 Extracting Subsets from a Larger Image Frame¶

Because of the potentially large data volume when dealing with time series data stacks, it may be prudent to read only a subset of data.

Using GDAL's ReadAsArray() function, subsets can be requested by defining pixel offsets and subset size:

img.ReadAsArray(xoff=0, yoff=0, xsize=None, ysize=None)

xoff, yoff are the offsets from the upper left corner in pixel/line coordinates.
xsize, ysize specify the size of the subset in x-direction (left to right) and y-direction (top to bottom).

For example, we can read only a subset of 5x5 pixels with an offset of 5 pixels and 20 lines:

In [ ]:

raster_sub = band.ReadAsArray(5, 20, 50, 50)

The result is a two dimensional numpy array in the datatpye the data were stored in. We can inspect these data in python by typing the array name on the commandline:

In [ ]:

raster_sub

4.3 Displaying Bands in the Time Series of SAR Data¶

From the lookup table we know that bands 20 and 27 in the Nepal data stack are from mid February and late August. Let's take look at these images.

In [ ]:

raster_1 = img.GetRasterBand(20).ReadAsArray()
raster_2 = img.GetRasterBand(27).ReadAsArray()

4.3.1 Write a Plotting Function

Matplotlib's plotting functions allow for powerful options to display imagery. We are following some standard approaches for setting up figures. First we are looking at a raster band and it's associated histogram.

Our function, show_image() takes several parameters:

raster = a numpy two dimensional array
tindex = a panda index array for dates
bandnbr = the band number the corresponds to the raster
vmin = minimim value to display
vmax = maximum value to display
output_filename = name of output file, if saving the plot

Note: By default, data will be linearly stretched between vmin and vmax.

We won't use this function in this notebook but it is a useful utility method, which can be copied and pasted for use in other analyses

In [ ]:

def show_image_histogram(raster, tindex, band_nbr, vmin=None, vmax=None, output_filename=None):
    assert 'plb' in globals(), 'Error: matplotlib.pylab must be imported as "plb"'
    assert 'plt' in globals(), 'Error: matplotlib.pyplot must be imported as "plt"'  
    
    fig = plb.figure(figsize=(16, 8))
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)
    
    # plot image
    ax1.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
    ax1.set_title('Image Band {} {}'.format(band_nbr, tindex[band_nbr-1].date()))
    vmin = np.percentile(raster, 2) if vmin==None else vmin
    vmax = np.percentile(raster, 98) if vmax==None else vmax
    ax1.xaxis.set_label_text('Linear stretch Min={} Max={}'.format(vmin, vmax))
    
    #plot histogram
    h = ax2.hist(raster.flatten(), bins=200, range=(0, 10000))
    ax2.xaxis.set_label_text('Amplitude (Uncalibrated DN Values)')
    ax2.set_title('Histogram Band {} {}'.format(band_nbr, tindex[band_nbr-1].date()))
    
    if output_filename:
        plt.savefig(f'{output_filename}', dpi=300, transparent='true')

We won't be calling our new function elsewhere in this notebook, so test it now:

In [ ]:

# change the last argument to decide where you want to store image. 
show_image_histogram(raster_1, tindex, 20, vmin=2000, vmax=10000, output_filename=path/f'show_time_series.png')

5. SAR Time Series Visualization, Animation, and Analysis¶

This section introduces you to the handling and analysis of SAR time series stacks. A focus will be put on time series visualization, which allow us to inspect time series in more depth. Note that html animations are not exported into the pdf file, but will display interactively.

5.1 Reading the SAR Time Series Subset¶

Let's read an image subset (offset 400, 400 / size 600, 600) of the entire time series data stack. The data are linearly scaled amplitudes represented as unsigned 16 bit integer.

We use the GDAL ReadAsArray(xoff,yoff,xsize,ysize) function where xoff is the offset in pixels from upper left; yoff is the offset in lines from upper left; xsize is the number of pixels and ysize is the number of lines of the subset.

If ReadAsArray() is called without any parameters, the entire image data stack is read.

Let's first define a subset and make sure it is in the right geographic location.

In [ ]:

# Open the image and read the first raster band
band = img.GetRasterBand(1)

# Define the subset
subset = (400, 400, 600, 600)

Now we are ready to extract this subset from all slices of the data stack.

In [ ]:

# Plot one band together with the outline of the selected subset to verify its geographic location.
raster = band.ReadAsArray()
vmin = np.percentile(raster.flatten(), 5)
vmax = np.percentile(raster.flatten(), 95)
fig = plb.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
# plot the subset as rectangle
_ = ax.add_patch(patches.Rectangle((subset[0], subset[1]), subset[2], subset[3], fill=False, edgecolor='red'))

In [ ]:

raster0 = band.ReadAsArray(*subset)
bandnbr = 0 # Needed for updates
rasterstack = img.ReadAsArray(*subset)

Close img, as it is no longer needed in the notebook:

In [ ]:

img = None

5.2 Calibration and Data Conversion between dB and Power Scales¶

Focused SAR image data natively come in uncalibrated digital numbers (DN) and need to be calibrated to correspond to proper radar cross section information.

Calibration coefficients for SAR data are often defined in the decibel (dB) scale due to the high dynamic range of the imaging system. For the L-band ALOS PALSAR data at hand, the conversion from uncalibrated DN values to calibrated radar cross section values in dB scale is performed by applying a standard calibration factor of -83 dB.

$\gamma^0_{dB} = 20 \cdot log10(DN) -83$

The data at hand are radiometrically terrain corrected images, which are often expressed as terrain flattened $\gamma^0$ backscattering coefficients. For forest and land cover monitoring applications $\gamma^o$ is the preferred metric.

Let's apply the calibration constant for our data and export it in dB scale:

In [ ]:

caldB = 20*np.log10(rasterstack) - 83

While dB-scaled images are often "visually pleasing", they are often not a good basis for mathematical operations on data. For instance, when we compute the mean of observations, it makes a difference whether we do that in power or dB scale. Since dB scale is a logarithmic scale, we cannot simply average data in that scale.

Please note that the correct scale in which operations need to be performed is the power scale. This is critical, e.g. when speckle filters are applied, spatial operations like block averaging are performed, or time series are analyzed.

To convert from dB to power, apply: $\gamma^o_{pwr} = 10^{\frac{\gamma^o_{dB}}{10}}$

In [ ]:

calPwr = np.power(10., caldB/10.)

5.3 Create a Time Series Animation¶

First, Create a directory in which to store our plots and move into it:

In [ ]:

product_path = path/'plots_and_animations'
if not product_path.exists():
    product_path.mkdir()

Now we are ready to create a time series animation from the calibrated SAR data.

In [ ]:

%%capture 
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.axis('off')
vmin = np.percentile(caldB.flatten(), 5)
vmax = np.percentile(caldB.flatten(), 95)
r0dB = 20*np.log10(raster0) - 83
im = ax.imshow(r0dB,cmap='gray', vmin=vmin, vmax=vmax)
ax.set_title("{}".format(tindex[0].date()))

def animate(i):
    ax.set_title("{}".format(tindex[i].date()))
    im.set_data(caldB[i])

# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=rasterstack.shape[0], interval=400)

Configure matplotlib's RC settings for the animation:

In [ ]:

rc('animation', embed_limit=40971520.0)  # We need to increase the limit maybe to show the entire animation

Create a javascript animation of the time-series running inline in the notebook:

In [ ]:

HTML(ani.to_jshtml())

Save the animation (animation.gif):

In [ ]:

ani.save(f'{product_path}/animation.gif', writer='pillow', fps=2)

5.3 Plot the Time Series of Means Calculated Across the Subset¶

To create the time series of means, we will go through the following steps:

Compute means using the data in power scale ($\gamma^o_{pwr}$) .
Convert the resulting mean values into dB scale for visualization.
Plot time series of means.

Compute the means:

In [ ]:

rs_means_pwr = np.mean(calPwr,axis=(1, 2))

Convert the resulting mean value time-series to dB scale for visualization and check that we got the means over time:

In [ ]:

rs_means_dB = 10.*np.log10(rs_means_pwr)
rs_means_pwr.shape

Plot and save the time series of means (time_series_means.png):

In [ ]:

# 3. Now let's plot the time series of means
fig = plt.figure(figsize=(16, 4))
ax1 = fig.add_subplot(111)
ax1.plot(tindex, rs_means_pwr)
ax1.set_xlabel('Date')
ax1.set_ylabel(r'$\overline{\gamma^o}$ [power]')


ax2 = ax1.twinx()
ax2.plot(tindex, rs_means_dB, color='red')
ax2.set_ylabel(r'$\overline{\gamma^o}$ [dB]')
fig.legend(['power', 'dB'], loc=1)
plt.title(r'Time series profile of average band backscatter $\gamma^o$ ')
plt.savefig(f'{product_path}/time_series_means', dpi=72, transparent='true')

5.4 Create Two-Panel Figure with Animated Global Mean $\mu_{\gamma^0_{dB}}$¶

We use a few Matplotlib functions to create a side-by-side animation of the dB-scaled imagery and the respective global means $\mu_{\gamma^0_{dB}}$.

In [ ]:

%%capture 
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4), gridspec_kw={'width_ratios':[1, 3]})

vmin = np.percentile(rasterstack.flatten(), 5)
vmax = np.percentile(rasterstack.flatten(), 95)
im = ax1.imshow(raster0, cmap='gray', vmin=vmin, vmax=vmax)
ax1.set_title("{}".format(tindex[0].date()))
ax1.set_axis_off()

ax2.axis([tindex[0].date(), tindex[-1].date(), rs_means_dB.min(), rs_means_dB.max()])
ax2.set_ylabel('$\overline{\gamma^o}$ [dB]')
ax2.set_xlabel('Date')
ax2.set_ylim((-10, -5))
l, = ax2.plot([], [])

def animate(i):
    ax1.set_title("{}".format(tindex[i].date()))
    im.set_data(rasterstack[i])
    ax2.set_title("{}".format(tindex[i].date()))
    l.set_data(tindex[:(i+1)], rs_means_dB[:(i+1)])

# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=rasterstack.shape[0], interval=400)

Create a javascript animation of the time-series running inline in the notebook:

In [ ]:

HTML(ani.to_jshtml())

Save the animated time-series and histogram (animation_histogram.gif):

In [ ]:

ani.save(f'{product_path}/animation_histogram.gif', writer='pillow', fps=2)

Time_Series_Example.ipynb - Version 1.5.3 - February 2024

Version Changes

Use raw strings for LateX in matplotlib labels

Exploring SAR Data and SAR Time Series Analysis with Supplied Data¶

Franz J Meyer; University of Alaska Fairbanks & Josef Kellndorfer, Earth Big Data, LLC¶

0. Importing Relevant Python Packages¶

1. Load Data Stack¶

2. Switch to the Data Directory:¶

3. Assess Image Acquisition Dates¶

4. Explore the Available Image Data¶

4.1 Reading Data from an Image Band¶

4.2 Extracting Subsets from a Larger Image Frame¶

4.3 Displaying Bands in the Time Series of SAR Data¶

5. SAR Time Series Visualization, Animation, and Analysis¶

5.1 Reading the SAR Time Series Subset¶

5.2 Calibration and Data Conversion between dB and Power Scales¶

5.3 Create a Time Series Animation¶

5.3 Plot the Time Series of Means Calculated Across the Subset¶

5.4 Create Two-Panel Figure with Animated Global Mean $\mu_{\gamma^0_{dB}}$¶

Franz J Meyer; University of Alaska Fairbanks & Josef Kellndorfer, Earth Big Data, LLC ¶