This notebook introduces you to a some popular change detection methods that can be applied on SAR time series data. SAR is an excellent tool for change detection. As SAR sensors are weather and illumination independent, and as SAR's carry their own illumination source (active sensor), differences between repeated images are a direct indication of changes on the surface. This fact is exploited by the change detection methods introduced below.
The exercise is done in the framework of Jupyter Notebooks. The Jupyter Notebook environment is easy to launch in any web browser for interactive data exploration with provided or new training data. Notebooks are comprised of text written in a combination of executable python code and markdown formatting including latex style mathematical equations. Another advantage of Jupyter Notebooks is that they can easily be expanded, changed, and shared with new data sets or newly available time series steps. Therefore, they provide an excellent basis for collaborative and repeatable data analysis.
This notebook covers the following data analysis concepts:
Note: The next notebook will add Time Series Change Point Detection as an additional useful method.
Important Note about JupyterHub
Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.
import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)
from IPython.display import Markdown
from IPython.display import display
notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))
In this notebook we will use the following scientific libraries:
Our first step is to import them:
%%capture
from pathlib import Path
from os import system
import numpy as np
import pandas as pd # for DatetimeIndex
from osgeo import gdal # for Info
gdal.UseExceptions()
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import animation
from matplotlib import rc
from IPython.display import HTML
import opensarlab_lib as asfn
asfn.jupytertheme_matplotlib_format()
This notebook will again be using a 78-image deep dual-polarization C-band SAR data stack over Madre de Dios in Peru to demonstrate how to create color composites from multi temporal and dual-polarization SAR data and how to derive higher level parameters such as the multi-temporal mean, the coefficient of variation, and others. The C-band data were acquired by ESA's Sentinel-1 SAR sensor constellation and are available to you through the services of the Alaska Satellite Facility.
The site in question is interesting as it has experienced extensive logging over the last 10 years (see image to the right; Monitoring of the Andean Amazon Project). Since the 1980s, people have been clearing forests in this area for farming, cattle ranching, logging, and (recently) gold mining. Creating RGB color composites is an easy way to visualize ongoing changes in the landscape.
Before we get started, let's first create a working directory for this analysis and change into it:
path = Path("/home/jovyan/notebooks/SAR_Training/English/Ecosystems/data_Ex2-4_S1-MadreDeDios")
if not path.exists():
path.mkdir()
We will retrieve the relevant data from an Amazon Web Service (AWS) cloud storage bucket using the following command:
time_series_path = 's3://asf-jupyter-data-west/MadreDeDios.zip'
time_series = Path(time_series_path).name
!aws --region=us-west-2 --no-sign-request s3 cp $time_series_path $time_series
Now, let's unzip the file (overwriting previous extractions) and clean up after ourselves:
if Path(time_series).exists():
asfn.asf_unzip(str(path), time_series)
Path(time_series).unlink()
We are defining two helper functions for this notebook:
def create_geotiff(name, array, data_type, ndv, bandnames=None, ref_image=None,
geo_t=None, projection=None):
# If it's a 2D image we fake a third dimension:
if len(array.shape) == 2:
array = np.array([array])
if ref_image == None and (geo_t == None or projection == None):
raise RuntimeWarning('ref_image or settings required.')
if bandnames != None:
if len(bandnames) != array.shape[0]:
raise RuntimeError(f'Need {array.shape[0]} bandnames. {len(bandnames)} given')
else:
bandnames = [f'Band {i+1}' for i in range(array.shape[0])]
if ref_image != None:
refimg = gdal.Open(ref_image)
geo_t = refimg.GetGeoTransform()
projection = refimg.GetProjection()
driver = gdal.GetDriverByName('GTIFF')
array[np.isnan(array)] = ndv
DataSet = driver.Create(name, array.shape[2],
array.shape[1], array.shape[0],
data_type)
DataSet.SetGeoTransform(geo_t)
DataSet.SetProjection(projection)
for i, image in enumerate(array, 1):
DataSet.GetRasterBand(i).WriteArray( image )
DataSet.GetRasterBand(i).SetNoDataValue(ndv)
DataSet.SetDescription(bandnames[i-1])
DataSet.FlushCache()
return name
def timeseries_metrics(raster, ndv=0):
# Make us of numpy nan functions
# Check if type is a float array
if not raster.dtype.name.find('float') > -1:
raster = raster.astype(np.float32)
# Set ndv to nan
if ndv != np.nan:
raster[np.equal(raster, ndv)] = np.nan
# Build dictionary of the metrics
tsmetrics = {}
rperc = np.nanpercentile(raster, [5, 50, 95], axis=0)
tsmetrics['mean'] = np.nanmean(raster, axis=0)
tsmetrics['max'] = np.nanmax(raster, axis=0)
tsmetrics['min'] = np.nanmin(raster, axis=0)
tsmetrics['range'] = tsmetrics['max'] - tsmetrics['min']
tsmetrics['median'] = rperc[1]
tsmetrics['p5'] = rperc[0]
tsmetrics['p95'] = rperc[2]
tsmetrics['prange'] = rperc[2] - rperc[0]
tsmetrics['var'] = np.nanvar(raster, axis=0)
tsmetrics['cov'] = tsmetrics['var'] / tsmetrics['mean']
return tsmetrics
Create a variable containing the VRT filename and the image acquisition dates:
!gdalbuildvrt -separate {path}/raster_stack.vrt {path}/tiffs/*_VH.tiff
image_file_VH = path/"raster_stack.vrt"
Create an index of timedelta64 data with Pandas:
!ls {path}/tiffs/*_VH.tiff | sort | sed 's/[^0-9]*//g' | cut -c 4-11 > {path}/raster_stack_VH.dates
datefile_VH = path/'raster_stack_VH.dates'
with open(str(datefile_VH)) as f:
dates_VH = f.readlines()
tindex_VH = pd.DatetimeIndex(dates_VH)
Print the bands and dates for all images in the virtual raster table (VRT):
print(f"Bands and dates for {image_file_VH}")
for i, t in enumerate(tindex_VH):
print("{:4d} {}".format(i+1, t.date()), end=' ')
if (i+1) % 5 == 1:
print()
img = gdal.Open(str(image_file_VH))
band = img.GetRasterBand(1)
raster0 = band.ReadAsArray()
band_number = 0 # Needed for updates
rasterstack_VH = img.ReadAsArray()
Print the bands, pixels, and lines:
print(f"Number of bands: {img.RasterCount}")
print(f"Number of pixels: {img.RasterXSize}")
print(f"Number of lines: {img.RasterYSize}")
Note, that if your data were generated by HyP3, this step is not necessary! HyP3 performs the full data calibration and provides you with calibrated data in power scale.
If, your data is from a different source, however, calibration may be necessary to ensure that image gray values correspond to proper radar cross section information.
Calibration coefficients for SAR data are often defined in the decibel (dB) scale due to the high dynamic range of the imaging system. For the L-band ALOS PALSAR data at hand, the conversion from uncalibrated DN values to calibrated radar cross section values in dB scale is performed by applying a standard calibration factor of -83 dB.
$\gamma^0_{dB} = 20 \cdot log10(DN) -83$
The data at hand are radiometrically terrain corrected images, which are often expressed as terrain flattened $\gamma^0$ backscattering coefficients. For forest and land cover monitoring applications $\gamma^o$ is the preferred metric.
To apply the calibration constant for your data and export in dB scale, uncomment the following code cell:
#caldB=20*np.log10(rasterstack)-83
While dB-scaled images are often "visually pleasing", they are often not a good basis for mathematical operations on data. For instance, when we compute the mean of observations, it makes a difference whether we do that in power or dB scale. Since dB scale is a logarithmic scale, we cannot simply average data in that scale.
Please note that the correct scale in which operations need to be performed is the power scale. This is critical, e.g. when speckle filters are applied, spatial operations like block averaging are performed, or time series are analyzed.
To convert from dB to power, apply: $\gamma^o_{pwr} = 10^{\frac{\gamma^o_{dB}}{10}}$
#calPwr=np.power(10.,caldB/10.)
Create and move into a directory in which to store our plots and animations:
product_path = path/'plots_and_animations'
if not product_path.exists():
product_path.mkdir()
Now we can create the information needed to animate our data:
%%capture
fig = plt.figure(figsize=(14, 7))
ax = fig.subplots()
ax.axis('off')
vmin = np.percentile(rasterstack_VH.flatten(), 5)
vmax = np.percentile(rasterstack_VH.flatten(), 95)
r0dB = 20 * np.log10(raster0) - 83
im = ax.imshow(raster0, cmap='gray', vmin=vmin, vmax=vmax)
ax.set_title("{}".format(tindex_VH[0].date()))
def animate(i):
ax.set_title("{}".format(tindex_VH[i].date()))
im.set_data(rasterstack_VH[i])
# Interval is given in milliseconds
ani = animation.FuncAnimation(fig, animate, frames=rasterstack_VH.shape[0], interval=400)
Configure matplotlib's RC settings for the animation:
rc('animation', embed_limit=40971520.0) # We need to increase the limit maybe to show the entire animation
Create a JavaScript animation of the time-series running inline in the notebook:
HTML(ani.to_jshtml())
Delete the dummy png that was saved to the current working directory while generating the JavaScript animation in the last code cell.
try:
tmp = product_path/'None0000000.png'
tmp.unlink()
except FileNotFoundError:
pass
Save the animation (animation.gif
):
ani.save(product_path/'animation_VH.gif', writer='pillow', fps=2)
Once a time-series was constructed, we can compute a set of metrics that will aid us later in applications such as change detection and active agriculture detectionc. In the next code cells, we will compute the following variables for each pixel in the stack:
First, we mask out pixels without relevant information to be unbiased in statical number we calculate later. Then we calculate the time series metrics:
mask = rasterstack_VH == 0
rasterPwr = np.ma.array(rasterstack_VH, mask=mask, dtype=np.float32)
metrics = timeseries_metrics(rasterPwr.filled(np.nan), ndv=np.nan)
metric_keys = metrics.keys()
print(metric_keys)
Let's look at the histograms for the time series variance and coeficient of variation to aid displaying those images:
fig, ax = plt.subplots(1, 2, figsize=(16, 4))
ax[0].hist(metrics['var'].flatten(), bins=100, range=(0, 0.0001))
ax[1].hist(metrics['cov'].flatten(), bins=100, range=(0, 0.004))
_ = ax[0].set_title('Variance')
_ = ax[1].set_title('Coefficient of Variation')
We use thresholds determined from those histograms to set the scaling in the time series visualization. For the backscatter metrics we choose a typical range appropriate for this ecosystem, the C-band radar sensor, and the VH polarization. A typical range is -30 dB (0.0001) to -10 dB (0.1).
# List the metrics keys you want to plot
fig = plt.figure(figsize=(16, 40))
for i, key in enumerate(metric_keys):
ax = fig.add_subplot(5, 2, i+1)
if key == 'var':
vmin, vmax = (0.0, 0.0001)
elif key == 'cov':
vmin, vmax = (0.0, 0.002)
else:
vmin, vmax = (0.0, 0.1)
ax.imshow(metrics[key], vmin=vmin, vmax=vmax, cmap='gray')
ax.set_title(key.upper())
ax.axis('off')
This section will introduce you to the following popular and simple change detection methods:
In this method we find thresholds on the $95^{th}$ and $5^{th}$ percentile difference or the temporal pixel-by-pixel gray value cariance. Let's start with the $95^{th}$ and $5^{th}$ percentile difference. The advantage to look at percentiles verus maximum minus minimum is that outliers and extremas in the time series are not influencing the result.
For our example, the historgram and the Cumulative Distribution Function (CDF) of the $95^{th}$ and $5^{th}$ percentile difference image looks like this:
plt.rcParams.update({'font.size': 14})
fig = plt.figure(figsize=(14, 4)) # Initialize figure with a size
ax1 = fig.add_subplot(121) # 121 determines: 2 rows, 2 plots, first plot
ax2 = fig.add_subplot(122)
h = ax1.hist(metrics['range'].flatten(), bins=200, range=(0, 0.1))
ax1.xaxis.set_label_text('Percentile Range Metric')
ax1.set_title('Histogram')
n, bins, patches = ax2.hist(metrics['range'].flatten(),
bins=200, range=(0, 0.1),
cumulative='True', density='True',
histtype='step', label='Empirical')
ax2.xaxis.set_label_text('Percentile Range Metric')
ax2.set_title('CDF')
plt.savefig(product_path/'thresh_percentilerange_histogram.png',
dpi=200, transparent='true')
outind = np.where(n > 0.95)
threshind = np.min(outind)
thresh = bins[threshind]
ax1.axvline(thresh, color='red')
ax2.axvline(thresh, color='red')
Let's visualize the 5% of all pixels with the largest (95th - 5th percentile) difference in the time series. We will refer to the pixels (x,y) that exceed this threshold $t$ as likely change pixels (cp):
${cp}_{x,y} = P_{x,y}^{95th} - P_{x,y}^{5th} > t$
If we define $t$ to correspond to the 5% of pixels with highest (95th - 5th percentile) difference, the image looks like:
plt.figure(figsize=(8, 8))
mask = metrics['range'] < thresh # For display we prepare the inverse mask
maskpdiffrange=~mask # Store this for later output
plt.imshow(mask, cmap='gray')
_ = plt.title('Threshold Classifier on Range > %1.3f' % thresh)
plt.savefig(product_path/'changes_percentilerange_threshold.png', dpi=200, transparent='true')
Discuss what you see in this figure. What are the black areas? What are the white areas? What kind of changes may be included in this map?
Instead of applying a threshold on the 95th - 5th percentile difference data, we can also attempt to threshold other metrics. The Variance variable seems a useful indicator for change as it identifies pixels for which radar brightness has changed strongly within the time series. Hence, in the following we use this metric for change identification according to:
${cp}_{x,y} = \sigma^2 > t$
with $t=CDF_{\sigma^2} > 0.95$ (5% pixels with highest variance):
plt.rcParams.update({'font.size': 14})
fig = plt.figure(figsize=(14, 4)) # Initialize figure with a size
ax1 = fig.add_subplot(121) # 121 determines: 2 rows, 2 plots, first plot
ax2 = fig.add_subplot(122)
h = ax1.hist(metrics['var'].flatten(), bins=200, range=(0, 0.0001))
ax1.xaxis.set_label_text('Variance Metric')
ax1.set_title('Histogram')
n, bins, patches = ax2.hist(metrics['var'].flatten(), bins=200, range=(0, 0.0001), cumulative='True', density='True', histtype='step', label='Empirical')
ax2.xaxis.set_label_text('Variance Metric')
ax2.set_title('CDF')
plt.savefig(product_path/'thresh_var_histogram.png', dpi=200, transparent='true')
outind = np.where(n > 0.95)
threshind = np.min(outind)
threshvar = bins[threshind]
ax1.axvline(threshvar,color='red')
_ = ax2.axvline(threshvar,color='red')
plt.figure(figsize=(8, 8))
mask = metrics['var'] < threshvar # For display we prepare the inverse mask
maskvar = ~mask # Store this for later output
plt.imshow(mask, cmap='gray')
_ = plt.title('Threshold Classifier on Variance > %1.5f' % threshvar )
plt.savefig(product_path/'changes_var_threshold.png', dpi=200, transparent='true')
Discuss what you see in this figure. What kind of changes are detected? How does this change map compare to the previous one?
We can also set a threshold $t$ for the coefficient of variation image to classify change in the time series:
${cp}_{x,y} = \frac{\sigma_{x,y}^2}{\overline{X}_{x,y}} > t$
Let's look at the histogram and the Cumulative Distribution Function (CDF) of the coefficient of variation:
plt.rcParams.update({'font.size': 14})
fig = plt.figure(figsize=(14, 4)) # Initialize figure with a size
ax1 = fig.add_subplot(121) # 121 determines: 2 rows, 2 plots, first plot
ax2 = fig.add_subplot(122)
h = ax1.hist(metrics['cov'].flatten(), bins=200, range=(0, 0.002))
ax1.xaxis.set_label_text('Coefficient of Variation Metric')
ax1.set_title('Histogram')
n, bins, patches = ax2.hist(metrics['cov'].flatten(), bins=200, range=(0, 0.002), cumulative='True', density='True', histtype='step', label='Empirical')
ax2.xaxis.set_label_text('Coefficient of Variation Metric')
ax2.set_title('CDF')
plt.savefig(product_path/'thresh_cov_histogram.png', dpi=200, transparent='true')
outind = np.where(n > 0.95)
threshind = np.min(outind)
threshcov = bins[threshind]
ax1.axvline(threshcov,color='red')
_ = ax2.axvline(threshcov,color='red')
With a threshold of $t=CDF_{cov^2} > 0.95$ (5% pixels with highest variance) the change pixels would look like the following image:
plt.figure(figsize=(8,8))
mask = metrics['cov'] < threshcov # For display we prepare the inverse mask
maskcov = ~mask # Store this for later output
plt.imshow(mask, cmap='gray')
_ = plt.title('Threshold Classifier on Coefficient of Variation > %1.3f' % threshcov )
plt.savefig(product_path/'changes_CV_threshold.png', dpi=200, transparent='true')
Discuss what you see in this figure. What kind of changes are detected? How is this change map different from the previous two? What are the similarities?
In this method, we compare two images from the same season in different years. First we look at global means of the backscatter images building a time series object of acquisition dates and global image means of backscatter.
tsmean = 10*np.log10(np.nanmean(rasterPwr.filled(np.nan), axis=(1, 2)))
We make a time series object to list the dates, mean backscatter in dB and band index number for the rasterPwr
array:
ts = pd.Series(tsmean, index=tindex_VH)
for i, v in ts.items():
print(f"{i} {v}")
To compare two dates for change detection with the log ratio approach we pick two dates of relatively similar backscatter and from similar times of the year. Two such candidate dates are:
For our cross-polarized data
# # Cross-polarized
cross_pol_ref = rasterPwr[21] # Reference Image
cross_pol_new = rasterPwr[51] # New Image
The Log ratio between the images is
$r = log_{10}(\frac{X_i}{X_r})$
r = np.log10(cross_pol_new / cross_pol_ref)
To find a threshold for change, we can display the absolute ratio image $abs(r)$ and the historgram of $r$. We adjust the scale factors for the display to enhance visualization of change areas with largest backscatter change between this image pair. Brighter values show larger change.
# Display r
fig, ax = plt.subplots(2, 2, figsize=(16, 16))
ax[0][0].axis('off')
ax[0][1].axis('off')
ax[1][0].axis('off')
ax[0][0].set_title('C-VH')
ax[0][0].imshow(cross_pol_ref, vmin=0, vmax=0.1, cmap='gray')
ax[0][1].imshow(cross_pol_new, vmin=0, vmax=0.1, cmap='gray')
ax[1][0].imshow(np.abs(r), vmin=0, vmax=0.3, cmap='gray')
_ = ax[1][1].hist(r.flatten(), bins=100, range=(-0.4, 0.4))
Let's define change pixels as those falling outside the range of three times the standard deviation of the ration image $\sigma_r$ from the image mean $\bar{r}$:
${cp}_{x,y} = (r_{x,y} < \overline{r} - 3\sigma_r) \ \textrm{or} \ (r_{x,y} > \overline{r} + 3\sigma_r)$
We are using the numpy masking to set the non-changing pixels inside the range:
stddev = np.std(r)
thres = 3 * stddev
mask = np.logical_and(r>-1*thres, r<thres)
masklr = ~mask
Let's display pixels that fall outside 3 times the standard deviation.
fig,ax = plt.subplots(figsize=(8,16))
ax.imshow(mask, cmap='gray')
ax.xaxis.set_ticks([])
ax.yaxis.set_ticks([])
_ = ax.set_title('Log Ratio Classifier of the September 2017/2018 Log Ratio Images')
plt.savefig(product_path/'changes_LR_threshold.png', dpi=200, transparent='true')
Discuss what you see in this figure. What kind of changes are detected? How is this change map different from the previous three? What are the similarities? Note that this change map is based on pairs of images and not on the full time series as the previous three. How does this affect your results?
proj = img.GetProjection()
geotrans = list(img.GetGeoTransform())
geotrans
We use the root of the time series data stack name and append a ts_metrics<metric>.tif
ending as filenames:
# Time Series Metrics as image image:
# We make a new subdirectory where we will store the images
dirname = image_file_VH.parent/f'{image_file_VH.stem}_tsmetrics'
if not dirname.exists():
dirname.mkdir()
Now we can output the individual metrics as GeoTIFF images:
names=[] # List to keep track of all the names
for i in metrics:
name = dirname/(str(image_file_VH.name).replace('.vrt',f'_{i}.tif'))
create_geotiff(str(name), metrics[i], gdal.GDT_Float32, np.nan,[i], geo_t=geotrans, projection=proj)
names.append(str(name))
To tie the images into one new raster stack of time series metrics we build a virtual raster table with all the metrics.
Trick: Use ' '.join(Names)
to build one long string of names separated by a space as input to gdalbuildvrt
:
print(dirname)
cmd = 'gdalbuildvrt -separate -overwrite -vrtnodata nan ' + \
f'{dirname}' + '.vrt ' + ' '.join(names)
# print(cmd)
system(cmd)
print('Time Series Metrics VRT File:\n', f"{dirname}.vrt")
We are going to write GeoTIFF output files that stores the results from the classifiers:
imagenamepdiff = product_path/(str(image_file_VH.name).replace('.vrt', '_pdiff_thresholds.tif'))
imagenamevar = product_path/str(image_file_VH.name).replace('.vrt', '_var_thresholds.tif')
imagenamecov = product_path/str(image_file_VH.name).replace('.vrt', '_cov_thresholds.tif')
imagenamelr = product_path/str(image_file_VH.name).replace('.vrt', '_lr_thresholds.tif')
create_geotiff(str(imagenamepdiff), maskpdiffrange, gdal.GDT_Byte, np.nan, [i], geo_t=geotrans, projection=proj)
create_geotiff(str(imagenamevar), maskvar, gdal.GDT_Byte, np.nan, [i], geo_t=geotrans, projection=proj)
create_geotiff(str(imagenamecov), maskcov, gdal.GDT_Byte, np.nan, [i], geo_t=geotrans, projection=proj)
_ = create_geotiff(str(imagenamelr), masklr, gdal.GDT_Byte, np.nan, [i], geo_t=geotrans, projection=proj)
Thresholds for the three methods are necessarily site dependent and need to be identified with calibration data or visual post-classification interpretation, and can subsequently be adjusted to maximize classification accuracy. Also, some methods will have advantages in different scenarios.
For a bit more information on change detection and SAR in general, please look at the recently published SAR Handbook: Comprehensive Methodologies for Forest Monitoring and Biomass Estimation.
Explore the 78 image deep data stack a bit more. Pick different time steps for the log-ratio change detection to create additional change maps. Answer the following questions for yourself:
Exercise4A-SARChangeDetectionMethods.ipynb - Version 1.4.2 - February 2024 Version Changes: