This notebook introduces you to the analysis of deep multi-temporal SAR image data stacks in the framework of Jupyter Notebooks. The Jupyter Notebook environment is easy to launch in any web browser for interactive data exploration with provided or new training data. Notebooks are comprised of text written in a combination of executable python code and markdown formatting including latex style mathematical equations. Another advantage of Jupyter Notebooks is that they can easily be expanded, changed, and shared with new data sets or newly available time series steps. Therefore, they provide an excellent basis for collaborative and repeatable data analysis.
This notebook covers the following data analysis concepts:
Important Note about JupyterHub
Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.
import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)
from IPython.display import Markdown
from IPython.display import display
notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))
In this notebook we will use the following scientific libraries:
Pandas is a Python library that provides high-level data structures and a vast variety of tools for analysis. The great feature of this package is the ability to translate rather complex operations with data into one or two commands. Pandas contains many built-in methods for filtering and combining data, as well as the time-series functionality.
GDAL is a software library for reading and writing raster and vector geospatial data formats. It includes a collection of programs tailored for geospatial data processing. Most modern GIS systems (such as ArcGIS or QGIS) use GDAL in the background.
NumPy is one of the principal packages for scientific applications of Python. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.
Matplotlib is a low-level library for creating two-dimensional diagrams and graphs. With its help, you can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs. Moreover, many popular plotting libraries are designed to work in conjunction with matplotlib.
%%capture
from pathlib import Path
import pandas as pd # for DatetimeIndex
from osgeo import gdal # for GetRasterBand, Open, ReadAsArray
gdal.UseExceptions()
import numpy as np #for log10, mean, percentile, power
%matplotlib inline
import matplotlib.pylab as plb # for add_patch, add_subplot, figure, hist, imshow, set_title, xaxis,_label, text
import matplotlib.pyplot as plt # for add_subplot, axis, figure, imshow, legend, plot, set_axis_off, set_data,
# set_title, set_xlabel, set_ylabel, set_ylim, subplots, title, twinx
import matplotlib.patches as patches # for Rectangle
import matplotlib.animation as an # for FuncAnimation
from matplotlib import rc
import opensarlab_lib as asfn
asfn.jupytertheme_matplotlib_format()
from IPython.display import HTML
plt.rcParams.update({'font.size': 12})
This notebook will be using a 70-image deep C-band SAR data stack over Nepal for a first experience with time series processing. The C-band data were acquired by the Sentinel-1 sensor and are available to us through the services of the Alaska Satellite Facility.
Nepal is an interesting site for this analysis due to the significant seasonality of precipitation that is characteristic for this region. Nepal is said to have five seasons: spring, summer, monsoon, autumn and winter. Precipitation is low in the winter (November - March) and peaks dramatically in the summer, with top rain rates in July, August, and September (see figure to the right). As SAR is sensitive to changes in soil moisture, and vegetation structure, these weather patterns have a noticeable impact on the Radar Cross Section ($\sigma$) time series information.
We will analyze the variation of $\sigma$ values over time and will interpret them in the context of the weather information shown in the figure to the right.
Before we get started, let's first create a working directory for this analysis and change into it:
path = Path.cwd()/"data_time_series_example"
if not path.exists():
path.mkdir()
We will retrieve the relevant data from an Amazon Web Service (AWS) cloud storage bucket using the following command:
s3_path = 's3://asf-jupyter-data-west/time_series.zip'
time_series_path = Path(s3_path).name
!aws --region=us-west-2 --no-sign-request s3 cp $s3_path $time_series_path
Now, let's unzip the file (overwriting previous extractions) and clean up after ourselves:
if Path(time_series_path).exists():
asfn.asf_unzip(str(path), time_series_path)
Path(time_series_path).unlink()
The following lines set path variables needed for data processing. This step is not necessary but it saves a lot of extra typing later. Define variables for the main data directory as well as for the files containing data and image information:
datadirectory = path/'time_series/S32644X696260Y3052060sS1-EBD'
datefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.dates'
imagefile = datadirectory/'S32644X696260Y3052060sS1_D_vv_0092_mtfil.vrt'
imagefile_cross = datadirectory/'S32644X696260Y3052060sS1_D_vh_0092_mtfil.vrt'
.vrt
files exist:¶# !ls {datadirectory}/*vrt #Uncomment this line to see a List of the files
Before we start analyzing the available image data, we want to examine the content of our data stack. First, we read the image acquisition dates for all files in the time series and create a pandas date index.
if datefile.exists():
with open(str(datefile), 'r') as f:
dates = f.readlines()
tindex = pd.DatetimeIndex(dates)
From the date index, we print the band numbers and dates:
if imagefile.exists():
print('Bands and dates for', imagefile)
for i, d in enumerate(tindex):
print("{:4d} {}".format(i+1, d.date()),end=' ')
if (i+1)%5 == 1: print()
To open an image file using the gdal.Open() function. This returns a variable (img) that can be used for further interactions with the file:
if imagefile.exists():
img = gdal.Open(str(imagefile))
To explore the image (number of bands, pixels, lines), you can use several functions associated with the image object (img) created in the last code cell:
print(img.RasterCount) # Number of Bands
print(img.RasterXSize) # Number of Pixels
print(img.RasterYSize) # Number of Lines
To access any band in the image, use GDAL's GetRasterBand(x) function. Replace the band_num value with the number of the band you wish to access.
band_num = 70
band = img.GetRasterBand(band_num)
Once a band is seleted, several functions associated with the band are available for further processing, e.g., band.ReadAsArray(xoff=0,yoff=0,xsize=None,ysize=None)
Let's read the entire raster layer for the band:
raster = band.ReadAsArray()
Because of the potentially large data volume when dealing with time series data stacks, it may be prudent to read only a subset of data.
Using GDAL's ReadAsArray()
function, subsets can be requested by defining pixel offsets and subset size:
img.ReadAsArray(xoff=0, yoff=0, xsize=None, ysize=None)
xoff, yoff
are the offsets from the upper left corner in pixel/line coordinates.xsize, ysize
specify the size of the subset in x-direction (left to right) and y-direction (top to bottom).For example, we can read only a subset of 5x5 pixels with an offset of 5 pixels and 20 lines:
raster_sub = band.ReadAsArray(5, 20, 50, 50)
The result is a two dimensional numpy array in the datatpye the data were stored in. We can inspect these data in python by typing the array name on the commandline:
raster_sub
From the lookup table we know that bands 20 and 27 in the Nepal data stack are from mid February and late August. Let's take look at these images.
raster_1 = img.GetRasterBand(20).ReadAsArray()
raster_2 = img.GetRasterBand(27).ReadAsArray()
4.3.1 Write a Plotting Function
Matplotlib's plotting functions allow for powerful options to display imagery. We are following some standard approaches for setting up figures. First we are looking at a raster band and it's associated histogram.
Our function, show_image()
takes several parameters:
raster
= a numpy two dimensional arraytindex
= a panda index array for datesbandnbr
= the band number the corresponds to the rastervmin
= minimim value to displayvmax
= maximum value to displayoutput_filename
= name of output file, if saving the plotPreconditions: matplotlib.pyplot
must be imported as plb
and matplotlib.pyplot
must be imported as plt
.
Note: By default, data will be linearly stretched between vmin
and vmax
.
We won't use this function in this notebook but it is a useful utility method, which can be copied and pasted for use in other analyses
def show_image_histogram(raster, tindex, band_nbr, vmin=None, vmax=None, output_filename=None):
assert 'plb' in globals(), 'Error: matplotlib.pylab must be imported as "plb"'
assert 'plt' in globals(), 'Error: matplotlib.pyplot must be imported as "plt"'
fig = plb.figure(figsize=(16, 8))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
plt.rcParams.update({'font.size': 14})
# plot image
ax1.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
ax1.set_title(f'Image Band {band_nbr} {tindex[band_nbr-1].date()}')
vmin = np.percentile(raster, 2) if vmin==None else vmin
vmax = np.percentile(raster, 98) if vmax==None else vmax
ax1.xaxis.set_label_text(f'Linear stretch Min={vmin} Max={vmax}')
#plot histogram
h = ax2.hist(raster.flatten(), bins=200, range=(0, 10000))
ax2.xaxis.set_label_text('Amplitude (Uncalibrated DN Values)')
ax2.set_title(f'Histogram Band {band_nbr} {tindex[band_nbr-1].date()}')
if output_filename:
plt.savefig(output_filename, dpi=300, transparent='true')
We won't be calling our new function elsewhere in this notebook, so test it now:
show_image_histogram(raster_1, tindex, 20, vmin=2000, vmax=10000)
This section introduces you to the handling and analysis of SAR time series stacks. A focus will be put on time series visualization, which allow us to inspect time series in more depth. Note that html animations are not exported into the pdf file, but will display interactively.
Let's read an image subset (offset 400, 400 / size 600, 600) of the entire time series data stack. The data are linearly scaled amplitudes represented as unsigned 16 bit integer.
We use the GDAL ReadAsArray(xoff,yoff,xsize,ysize)
function where xoff
is the offset in pixels from upper left; yoff
is the offset in lines from upper left; xsize
is the number of pixels and ysize
is the number of lines of the subset.
If ReadAsArray()
is called without any parameters, the entire image data stack is read.
Let's first define a subset and make sure it is in the right geographic location.
# Open the image and read the first raster band
band = img.GetRasterBand(1)
# Define the subset
subset = (400, 400, 600, 600)
Now we are ready to extract this subset from all slices of the data stack.
# Plot one band together with the outline of the selected subset to verify its geographic location.
raster = band.ReadAsArray()
vmin = np.percentile(raster.flatten(), 5)
vmax = np.percentile(raster.flatten(), 95)
fig = plb.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
ax.imshow(raster, cmap='gray', vmin=vmin, vmax=vmax)
# plot the subset as rectangle
_ = ax.add_patch(patches.Rectangle((subset[0], subset[1]), subset[2], subset[3], fill=False, edgecolor='red'))
raster0 = band.ReadAsArray(*subset)
bandnbr = 0 # Needed for updates
rasterstack = img.ReadAsArray(*subset)
Close img, as it is no longer needed in the notebook:
img = None
Focused SAR image data natively come in uncalibrated digital numbers (DN) and need to be calibrated to correspond to proper radar cross section information.
Calibration coefficients for SAR data are often defined in the decibel (dB) scale due to the high dynamic range of the imaging system. For the L-band ALOS PALSAR data at hand, the conversion from uncalibrated DN values to calibrated radar cross section values in dB scale is performed by applying a standard calibration factor of -83 dB.
$\gamma^0_{dB} = 20 \cdot log10(DN) -83$
The data at hand are radiometrically terrain corrected images, which are often expressed as terrain flattened $\gamma^0$ backscattering coefficients. For forest and land cover monitoring applications $\gamma^o$ is the preferred metric.
Let's apply the calibration constant for our data and export it in dB scale:
caldB = 20*np.log10(rasterstack) - 83
While dB-scaled images are often "visually pleasing", they are often not a good basis for mathematical operations on data. For instance, when we compute the mean of observations, it makes a difference whether we do that in power or dB scale. Since dB scale is a logarithmic scale, we cannot simply average data in that scale.
Please note that the correct scale in which operations need to be performed is the power scale. This is critical, e.g. when speckle filters are applied, spatial operations like block averaging are performed, or time series are analyzed.
To convert from dB to power, apply: $\gamma^o_{pwr} = 10^{\frac{\gamma^o_{dB}}{10}}$
calPwr = np.power(10., caldB/10.)
calAmp = np.sqrt(calPwr)
Let's explore how a band looks in the various image scales.
Let's choose a band number and find the associated imaging date:
bandnbr = 20
tindex[bandnbr-1]
Below is the python code to create a four-part figure comparing the effect of the representation of the backscatter values in the DN, amplitude, power and dB scale.
fig = plt.figure(figsize=(16,16))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
ax1.imshow(rasterstack[bandnbr],cmap='gray',
vmin=np.percentile(rasterstack,10),
vmax=np.percentile(rasterstack,90))
ax2.imshow(calAmp[bandnbr],cmap='gray',
vmin=np.percentile(calAmp,10),
vmax=np.percentile(calAmp,90))
ax3.imshow(calPwr[bandnbr],cmap='gray',
vmin=np.percentile(calPwr,10),
vmax=np.percentile(calPwr,90))
ax4.imshow(caldB[bandnbr],cmap='gray',
vmin=np.percentile(caldB,10),
vmax=np.percentile(caldB,90))
ax1.set_title('DN Scaled (Uncalibrated)')
ax2.set_title('Calibrated (Amplitude Scaled)')
ax3.set_title('Calibrated (Power Scaled)')
_ = ax4.set_title('Calibrated (dB Scaled)')
The following code cell calculates the histograms for the differently scaled data. You should see significant differences in the data distributions.
# Setup for three part figure
fig = plt.figure(figsize=(16,4))
fig.suptitle('Comparison of Histograms of SAR Backscatter in Different Scales',fontsize=14)
ax1 = fig.add_subplot(131)
ax2 = fig.add_subplot(132)
ax3 = fig.add_subplot(133)
# Important to "flatten" the 2D raster image to produce a historgram
ax1.hist(calAmp[bandnbr].flatten(),bins=100,range=(0.,0.7))
ax2.hist(calPwr[bandnbr].flatten(),bins=100,range=(0.,0.5))
ax3.hist(caldB[bandnbr].flatten(),bins=100,range=(-25,0))
# Means, medians and stddev
amp_mean = calAmp[bandnbr].mean()
amp_std = calAmp[bandnbr].std()
pwr_mean = calPwr[bandnbr].mean()
pwr_std = calPwr[bandnbr].std()
dB_mean = caldB[bandnbr].mean()
dB_std = caldB[bandnbr].std()
# Some lines for mean and median
ax1.axvline(amp_mean,color='red')
ax1.axvline(np.median(calAmp[bandnbr]),color='blue')
ax2.axvline(pwr_mean,color='red',label='Mean')
ax2.axvline(np.median(calPwr[bandnbr]),color='blue',label='Median')
ax3.axvline(dB_mean,color='red')
ax3.axvline(np.median(caldB[bandnbr]),color='blue')
# Lines for 1 stddev
ax1.axvline(amp_mean-amp_std,color='gray')
ax1.axvline(amp_mean+amp_std,color='gray')
ax2.axvline(pwr_mean-pwr_std,color='gray',label=r'1 $\sigma$')
ax2.axvline(pwr_mean+pwr_std,color='gray')
ax3.axvline(dB_mean-dB_std,color='gray')
ax3.axvline(dB_mean+dB_std,color='gray')
ax1.set_title('Amplitude Scaled')
ax2.set_title('Power Scaled')
ax3.set_title('dB Scaled')
_ = ax2.legend()
We will look at the backscatter characteristics in co-polarized (same transmit and reveive polarzation, hh
or vv
) and cross-polarized (vh
or hv
polarization) SAR data. For this, we read a timestep in both polarizations, plot the histograms, and display the images in dB scale. First, we open the images, pick the bands from the same acquisition date, read the raster bands and convert them to dB scale.
# Open the Images
img_like = gdal.Open(str(imagefile))
img_cross = gdal.Open(str(imagefile_cross))
# Pick the bands, read rasters and convert to dB
bandnbr_like = 20
bandnbr_cross = 20
rl = img_like.GetRasterBand(bandnbr_like).ReadAsArray()
rc2 = img_cross.GetRasterBand(bandnbr_cross).ReadAsArray()
rl_dB = 20.*np.log10(rl)-83
rc_dB = 20.*np.log10(rc2)-83
Now, we explore the differences in the polarizations by plotting the images with their histograms. We look at the db ranges over which the histograms spread, and can adjust the linear scaling in the image display accordingly to enhace contrast. In the case below
Thus, we note that the cross-polarized data exhibit a larger dynamic range of about 2.5 dB.
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16, 16))
fig.suptitle('Comaprison of Like- and Cross-Polarized Sentinel-1 C-band Data',
fontsize=14)
ax[0][0].set_title('C-VV Image')
ax[0][1].set_title('C-VH Image')
ax[1][0].set_title('C-VV Histogram')
ax[1][1].set_title('C-VH Histogram')
ax[0][0].axis('off')
ax[0][1].axis('off')
ax[0][0].imshow(rl_dB, vmin=-25, vmax=-5, cmap='gray')
ax[0][1].imshow(rc_dB, vmin=-25, vmax=-5, cmap='gray')
ax[1][0].hist(rl_dB.flatten(), range=(-25, 0), bins=100)
ax[1][1].hist(rc_dB.flatten(), range=(-25, 0), bins=100)
fig.tight_layout() # Use the tight layout to make the figure more compact
First, Create a directory in which to store our plots and move into it:
product_path = path/'plots_and_animations'
if not product_path.exists():
product_path.mkdir()
Now we are ready to create a time series animation from the calibrated SAR data.
%%capture
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111)
ax.axis('off')
vmin = np.percentile(caldB.flatten(), 5)
vmax = np.percentile(caldB.flatten(), 95)
r0dB = 20*np.log10(raster0) - 83
im = ax.imshow(r0dB,cmap='gray', vmin=vmin, vmax=vmax)
ax.set_title("{}".format(tindex[0].date()))
def animate(i):
ax.set_title("{}".format(tindex[i].date()))
im.set_data(caldB[i])
# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=caldB.shape[0], interval=400)
Configure matplotlib's RC settings for the animation:
rc('animation', embed_limit=40971520.0) # We need to increase the limit maybe to show the entire animation
Create a javascript animation of the time-series running inline in the notebook:
HTML(ani.to_jshtml())
Save the animation (animation.gif):
ani.save(product_path/'animation.gif', writer='pillow', fps=2)
To create the time series of means, we will go through the following steps:
Compute the means:
rs_means_pwr = np.mean(calPwr,axis=(1, 2))
Convert the resulting mean value time-series to dB scale for visualization and check that we got the means over time:
rs_means_dB = 10.*np.log10(rs_means_pwr)
rs_means_pwr.shape
Plot and save the time series of means (time_series_means.png):
# 3. Now let's plot the time series of means
fig = plt.figure(figsize=(16, 4))
ax1 = fig.add_subplot(111)
ax1.plot(tindex, rs_means_pwr)
ax1.set_xlabel('Date')
ax1.set_ylabel(r'$\overline{\gamma^o}$ [power]')
ax2 = ax1.twinx()
ax2.plot(tindex, rs_means_dB, color='red')
ax2.set_ylabel(r'$\overline{\gamma^o}$ [dB]')
fig.legend(['power', 'dB'], loc=1)
plt.title(r'Time series profile of average band backscatter $\gamma^o$ ')
plt.savefig(product_path/'time_series_means', dpi=72, transparent='true')
We use a few Matplotlib functions to create a side-by-side animation of the dB-scaled imagery and the respective global means $\mu_{\gamma^0_{dB}}$.
%%capture
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4), gridspec_kw={'width_ratios':[1, 3]})
vmin = np.percentile(rasterstack.flatten(), 5)
vmax = np.percentile(rasterstack.flatten(), 95)
im = ax1.imshow(raster0, cmap='gray', vmin=vmin, vmax=vmax)
ax1.set_title("{}".format(tindex[0].date()))
ax1.set_axis_off()
ax2.axis([tindex[0].date(), tindex[-1].date(), rs_means_dB.min(), rs_means_dB.max()])
ax2.set_ylabel('$\overline{\gamma^o}$ [dB]')
ax2.set_xlabel('Date')
ax2.set_ylim((-10, -5))
l, = ax2.plot([], [])
def animate(i):
ax1.set_title("{}".format(tindex[i].date()))
im.set_data(rasterstack[i])
ax2.set_title("{}".format(tindex[i].date()))
l.set_data(tindex[:(i+1)], rs_means_dB[:(i+1)])
# Interval is given in milliseconds
ani = an.FuncAnimation(fig, animate, frames=rasterstack.shape[0], interval=400)
Create a javascript animation of the time-series running inline in the notebook:
HTML(ani.to_jshtml())
Save the animated time-series and histogram (animation_histogram.gif):
ani.save(product_path/'animation_histogram.gif', writer='pillow', fps=2)
SAR Training Materials - Version 1.5.3- February 2024
Version Changes: