This notebook introduces you to the analysis of deep multi-temporal SAR image data stacks in the framework of Jupyter Notebooks. The Jupyter Notebook environment is easy to launch in any web browser for interactive data exploration with provided or new training data. Notebooks are comprised of text written in a combination of executable python code and markdown formatting including latex style mathematical equations. Another advantage of Jupyter Notebooks is that they can easily be expanded, changed, and shared with new data sets or newly available time series steps. Therefore, they provide an excellent basis for collaborative and repeatable data analysis.
This notebook covers the following data analysis concepts:
Important Notes about JupyterHub
Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.
import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)
from IPython.display import Markdown
from IPython.display import display
notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))
In this notebook we will use the following scientific libraries:
%%capture
from pathlib import Path
import pandas as pd # for DatetimeIndex
from osgeo import gdal # for GetRasterBand, Open, ReadAsArray
gdal.UseExceptions()
import numpy as np #for log10, mean, percentile, power
%matplotlib inline
import matplotlib.pylab as plb # for add_patch, add_subplot, figure, hist, imshow, set_title, xaxis,_label, text
import matplotlib.pyplot as plt # for add_subplot, axis, figure, imshow, legend, plot, set_axis_off, set_data,
# set_title, set_xlabel, set_ylabel, set_ylim, subplots, title, twinx
import matplotlib.patches as patches # for Rectangle
import matplotlib.animation as an # for FuncAnimation
from matplotlib import rc
import opensarlab_lib as asfn
asfn.jupytertheme_matplotlib_format()
from IPython.display import HTML
This notebook will be using a 70-image deep L-band SAR data stack over Nepal for a first experience with time series processing. The L-band data were acquired by the ALOS PALSAR sensor and are available to us through the services of the Alaska Satellite Facility.
Nepal is an interesting site for this analysis due to the significant seasonality of precipitation that is characteristic for this region. Nepal is said to have five seasons: spring, summer, monsoon, autumn and winter. Precipitation is low in the winter (November - March) and peaks dramatically in the summer, with top rain rates in July, August, and September (see figure to the right). As SAR is sensitive to changes in soil moisture, these weather patterns have a noticeable impact on the Radar Cross Section ($\sigma$) time series information.
We will analyze the variation of $\sigma$ values over time and will interpret them in the context of rainfall rates in the imaged area.
Before we get started, let's first create a working directory for this analysis and change into it:
path = Path("/home/jovyan/notebooks/SAR_Training/English/Master/data_time_series_example")
if not path.exists():
path.mkdir()
We will retrieve the relevant data from an Amazon Web Service (AWS) cloud storage bucket using the following command:
s3_path = 's3://asf-jupyter-data-west/time_series.zip'
time_series_path = path/Path(s3_path).name
!aws --region=us-west-2 --no-sign-request s3 cp $s3_path $time_series_path
Now, let's unzip the file (overwriting previous extractions) and clean up after ourselves:
if time_series_path.exists():
asfn.asf_unzip(str(path), str(time_series_path))
time_series_path.unlink()
The following lines set path variables needed for data processing. This step is not necessary but it saves a lot of extra typing later. Define variables for the main data directory as well as for the files containing data and image information:
datadirectory = path/'time_series/S32644X696260Y3052060sS1-EBD'
datefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.dates'
imagefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.vrt'
imagefile_cross = datadirectory/'S32644X696260Y3052060sS1_D_vh_0092_mtfil.vrt'
#!ls *.vrt #Uncomment this line to see a List of the files
Before we start analyzing the available image data, we want to examine the content of our data stack. First, we read the image acquisition dates for all files in the time series and create a pandas date index.
if datefile.exists():
with open(str(datefile), 'r') as f:
dates = f.readlines()
tindex = pd.DatetimeIndex(dates)
From the date index, we make and print a lookup table for band numbers and dates:
if imagefile.exists():
j = 1
print('Bands and dates for', imagefile)
for i in tindex:
print("{:4d} {}".format(j, i.date()),end=' ')
j += 1
if j%5 == 1: print()
To open an image file using the gdal.Open() function. This returns a variable (img) that can be used for further interactions with the file:
if imagefile.exists():
img = gdal.Open(str(imagefile))
To explore the image (number of bands, pixels, lines), you can use several functions associated with the image object (img) created in the last code cell:
print(img.RasterCount) # Number of Bands
print(img.RasterXSize) # Number of Pixels
print(img.RasterYSize) # Number of Lines
To access any band in the image, use GDAL's GetRasterBand(x) function. Replace the band_num value with the number of the band you wish to access.
band_num = 70
band = img.GetRasterBand(band_num)
Once a band is seleted, several functions associated with the band are available for further processing, e.g., band.ReadAsArray(xoff=0,yoff=0,xsize=None,ysize=None)
Let's read the entire raster layer for the band:
raster = band.ReadAsArray()
Because of the potentially large data volume when dealing with time series data stacks, it may be prudent to read only a subset of data.
Using GDAL's ReadAsArray() function, subsets can be requested by defining pixel offsets and subset size:
img.ReadAsArray(xoff=0, yoff=0, xsize=None, ysize=None)
For example, we can read only a subset of 5x5 pixels with an offset of 5 pixels and 20 lines:
raster_sub = band.ReadAsArray(5, 20, 50, 50)
The result is a two dimensional numpy array in the datatpye the data were stored in. We can inspect these data in python by typing the array name on the commandline:
raster_sub
From the lookup table we know that bands 20 and 27 in the Nepal data stack are from mid February and late August. Let's take look at these images.
raster_1 = img.GetRasterBand(20).ReadAsArray()
raster_2 = img.GetRasterBand(27).ReadAsArray()
4.3.1 Write a Plotting Function
Matplotlib's plotting functions allow for powerful options to display imagery. We are following some standard approaches for setting up figures. First we are looking at a raster band and it's associated histogram.
Our function, show_image() takes several parameters:
Note: By default, data will be linearly stretched between vmin and vmax.
We won't use this function in this notebook but it is a useful utility method, which can be copied and pasted for use in other analyses
def show_image_histogram(raster, tindex, band_nbr, vmin=None, vmax=None, output_filename=None):
assert 'plb' in globals(), 'Error: matplotlib.pylab must be imported as "plb"'
assert 'plt' in globals(), 'Error: matplotlib.pyplot must be imported as "plt"'
fig = plb.figure(figsize=(16, 8))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
# plot image
ax1.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
ax1.set_title('Image Band {} {}'.format(band_nbr, tindex[band_nbr-1].date()))
vmin = np.percentile(raster, 2) if vmin==None else vmin
vmax = np.percentile(raster, 98) if vmax==None else vmax
ax1.xaxis.set_label_text('Linear stretch Min={} Max={}'.format(vmin, vmax))
#plot histogram
h = ax2.hist(raster.flatten(), bins=200, range=(0, 10000))
ax2.xaxis.set_label_text('Amplitude (Uncalibrated DN Values)')
ax2.set_title('Histogram Band {} {}'.format(band_nbr, tindex[band_nbr-1].date()))
if output_filename:
plt.savefig(f'{output_filename}', dpi=300, transparent='true')
We won't be calling our new function elsewhere in this notebook, so test it now:
# change the last argument to decide where you want to store image.
show_image_histogram(raster_1, tindex, 20, vmin=2000, vmax=10000, output_filename=path/f'show_time_series.png')
This section introduces you to the handling and analysis of SAR time series stacks. A focus will be put on time series visualization, which allow us to inspect time series in more depth. Note that html animations are not exported into the pdf file, but will display interactively.
Let's read an image subset (offset 400, 400 / size 600, 600) of the entire time series data stack. The data are linearly scaled amplitudes represented as unsigned 16 bit integer.
We use the GDAL ReadAsArray(xoff,yoff,xsize,ysize) function where xoff is the offset in pixels from upper left; yoff is the offset in lines from upper left; xsize is the number of pixels and ysize is the number of lines of the subset.
If ReadAsArray() is called without any parameters, the entire image data stack is read.
Let's first define a subset and make sure it is in the right geographic location.
# Open the image and read the first raster band
band = img.GetRasterBand(1)
# Define the subset
subset = (400, 400, 600, 600)
Now we are ready to extract this subset from all slices of the data stack.
# Plot one band together with the outline of the selected subset to verify its geographic location.
raster = band.ReadAsArray()
vmin = np.percentile(raster.flatten(), 5)
vmax = np.percentile(raster.flatten(), 95)
fig = plb.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
# plot the subset as rectangle
_ = ax.add_patch(patches.Rectangle((subset[0], subset[1]), subset[2], subset[3], fill=False, edgecolor='red'))
raster0 = band.ReadAsArray(*subset)
bandnbr = 0 # Needed for updates
rasterstack = img.ReadAsArray(*subset)
Close img, as it is no longer needed in the notebook:
img = None
Focused SAR image data natively come in uncalibrated digital numbers (DN) and need to be calibrated to correspond to proper radar cross section information.
Calibration coefficients for SAR data are often defined in the decibel (dB) scale due to the high dynamic range of the imaging system. For the L-band ALOS PALSAR data at hand, the conversion from uncalibrated DN values to calibrated radar cross section values in dB scale is performed by applying a standard calibration factor of -83 dB.
$\gamma^0_{dB} = 20 \cdot log10(DN) -83$
The data at hand are radiometrically terrain corrected images, which are often expressed as terrain flattened $\gamma^0$ backscattering coefficients. For forest and land cover monitoring applications $\gamma^o$ is the preferred metric.
Let's apply the calibration constant for our data and export it in dB scale:
caldB = 20*np.log10(rasterstack) - 83
While dB-scaled images are often "visually pleasing", they are often not a good basis for mathematical operations on data. For instance, when we compute the mean of observations, it makes a difference whether we do that in power or dB scale. Since dB scale is a logarithmic scale, we cannot simply average data in that scale.
Please note that the correct scale in which operations need to be performed is the power scale. This is critical, e.g. when speckle filters are applied, spatial operations like block averaging are performed, or time series are analyzed.
To convert from dB to power, apply: $\gamma^o_{pwr} = 10^{\frac{\gamma^o_{dB}}{10}}$
calPwr = np.power(10., caldB/10.)
First, Create a directory in which to store our plots and move into it:
product_path = path/'plots_and_animations'
if not product_path.exists():
product_path.mkdir()
Now we are ready to create a time series animation from the calibrated SAR data.
%%capture
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.axis('off')
vmin = np.percentile(caldB.flatten(), 5)
vmax = np.percentile(caldB.flatten(), 95)
r0dB = 20*np.log10(raster0) - 83
im = ax.imshow(r0dB,cmap='gray', vmin=vmin, vmax=vmax)
ax.set_title("{}".format(tindex[0].date()))
def animate(i):
ax.set_title("{}".format(tindex[i].date()))
im.set_data(caldB[i])
# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=rasterstack.shape[0], interval=400)
Configure matplotlib's RC settings for the animation:
rc('animation', embed_limit=40971520.0) # We need to increase the limit maybe to show the entire animation
Create a javascript animation of the time-series running inline in the notebook:
HTML(ani.to_jshtml())
Save the animation (animation.gif):
ani.save(f'{product_path}/animation.gif', writer='pillow', fps=2)
To create the time series of means, we will go through the following steps:
Compute the means:
rs_means_pwr = np.mean(calPwr,axis=(1, 2))
Convert the resulting mean value time-series to dB scale for visualization and check that we got the means over time:
rs_means_dB = 10.*np.log10(rs_means_pwr)
rs_means_pwr.shape
Plot and save the time series of means (time_series_means.png):
# 3. Now let's plot the time series of means
fig = plt.figure(figsize=(16, 4))
ax1 = fig.add_subplot(111)
ax1.plot(tindex, rs_means_pwr)
ax1.set_xlabel('Date')
ax1.set_ylabel(r'$\overline{\gamma^o}$ [power]')
ax2 = ax1.twinx()
ax2.plot(tindex, rs_means_dB, color='red')
ax2.set_ylabel(r'$\overline{\gamma^o}$ [dB]')
fig.legend(['power', 'dB'], loc=1)
plt.title(r'Time series profile of average band backscatter $\gamma^o$ ')
plt.savefig(f'{product_path}/time_series_means', dpi=72, transparent='true')
We use a few Matplotlib functions to create a side-by-side animation of the dB-scaled imagery and the respective global means $\mu_{\gamma^0_{dB}}$.
%%capture
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4), gridspec_kw={'width_ratios':[1, 3]})
vmin = np.percentile(rasterstack.flatten(), 5)
vmax = np.percentile(rasterstack.flatten(), 95)
im = ax1.imshow(raster0, cmap='gray', vmin=vmin, vmax=vmax)
ax1.set_title("{}".format(tindex[0].date()))
ax1.set_axis_off()
ax2.axis([tindex[0].date(), tindex[-1].date(), rs_means_dB.min(), rs_means_dB.max()])
ax2.set_ylabel('$\overline{\gamma^o}$ [dB]')
ax2.set_xlabel('Date')
ax2.set_ylim((-10, -5))
l, = ax2.plot([], [])
def animate(i):
ax1.set_title("{}".format(tindex[i].date()))
im.set_data(rasterstack[i])
ax2.set_title("{}".format(tindex[i].date()))
l.set_data(tindex[:(i+1)], rs_means_dB[:(i+1)])
# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=rasterstack.shape[0], interval=400)
Create a javascript animation of the time-series running inline in the notebook:
HTML(ani.to_jshtml())
Save the animated time-series and histogram (animation_histogram.gif):
ani.save(f'{product_path}/animation_histogram.gif', writer='pillow', fps=2)
Time_Series_Example.ipynb - Version 1.5.3 - February 2024
Version Changes