Notebook

Using Leafmap and Earthaccess to Explore OPERA DSWx-HLS Products.¶

The Leafmap library provides a suite of tools for interactive mapping and visualization in Jupyter Notebooks Leafmap version 0.30.0 and and later offer tools specifically for accessing NASA Earthdata by building on the newly developed NASA Earthaccess library. Earthaccess provides streamlined access to NASA Earthdata and simplifies the authentication and querying process over previously developed approaches. This notebook is designed to leverage tools within Earthaccess and Leafmap to facility easier access and vizualization of OPERA data products for a user-specified area of interest (AOI).

OPERA DSWx-HLS info¶

see website https://www.jpl.nasa.gov/go/opera/products/dswx-product-suite

Import Libraries¶

In [ ]:

import earthaccess
import leafmap
import rasterio
from rasterio.crs import CRS
from rasterio.plot import show
from rasterio.mask import mask
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.image as image
from PIL import Image
from shapely import box
from datetime import datetime
import numpy as np

Authentication¶

A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided. After establishing an account, the code in the next cell will verify authentication. If this is your first time running the notebook, you will be prompted to enter your Earthdata login credentials, which will be saved in ~/.netrc.

In [ ]:

leafmap.nasa_data_login()

View NASA Earthdata datasets¶

A tab separated values (TSV) file, made available through the opengeos Github repository, catalogues metadata for more than 9,000 datasets available through NASA Earthdata. In the next cell we load the TSV into a pandas dataframe and view the metadata for the first five (5) Earthdata products

In [ ]:

### View Earthdata datasets
earthdata_url = 'https://github.com/opengeos/NASA-Earth-Data/raw/main/nasa_earth_data.tsv'
earthdata_df = pd.read_csv(earthdata_url, sep='\t')
# earthdata_df.head()

View the available OPERA products¶

Note above that the earthdata_df contains a number of columns with metadata about each available product. the ShortName column will be used to produce a new dataframe containing only OPERA products. Let's view the available products and their metadata.

In [ ]:

opera_df = earthdata_df[earthdata_df['ShortName'].str.contains('OPERA', case=False)]
# opera_df

Define an area of interest (AOI) and time period of interest (TOI)¶

Define an area of interest (AOI) for the flood event

In [ ]:

### This cell initializes the AOI and TOI.

AOI = f_bounds = (-118.179447, 34.182380,-118.164887,  34.208998) #W, S, E, N; This notebook looks at the Arroyo Secco Next to JPL
AOI_box = box(*AOI)

df_aoi = gpd.GeoDataFrame(geometry=[AOI_box], crs=CRS.from_epsg(4326))

# m = df_aoi.exterior.explore()

StartDate="2022-01-01T00:00:00"  #image start date
EndDate="2024-08-30T23:59:59"    #image end date

Query Earthdata and return metadata for OPERA products within the AOI¶

The earthaccess library makes it simple to quickly query NASA's Common Metadata Repository (CMR) and return the associated metadata as a Geodataframe. Leafmap has recently added functionality that builds on earthaccess to enable interactive viewing of this data. In the next cell, the user should specify which OPERA product and the date range of interest. The AOI defined previously is used as the boundary in the query.

View OPERA Product Shortnames¶

In [ ]:

### Print the available OPERA datasets 
print('Available OPERA datasets:', opera_df['ShortName'].values)

Query the OPERA DSWx-HLS dataset for the AOI¶

In [ ]:

dswx_results, dswx_gdf = leafmap.nasa_data_search(
    short_name='OPERA_L3_DSWX-HLS_V1',
    cloud_hosted=True,
    bounding_box= AOI,
    temporal=(StartDate, EndDate),
    count=-1,  # use -1 to return all datasets
    return_gdf=True,
    # cloud_cover=10,
    # bbox_geom=AOI,
    )

In [ ]:

dswx_gdf

See the available DSWx-HLS layers¶

Functionality within earthaccess enables more more asthetic views of the available layers, as well as displaying the thumbnail. These links are clickable and will download in the browser when clicked.

In [ ]:

dswx_results[0] #Note this just shows a single MGRS/HLS tile

View the DSWx-HLS metadata and footprints¶

In [ ]:

dswx_gdf.head()

In [ ]:

### Plot the location of the tiles 
dswx_gdf.explore(fill=False)

In [ ]:

df_aoi.explore(fill=False)

Download data with leafmap¶

Let's download the data from one of our above queries. In the cell below we specify data from the DSWx-HLS.

Create a subdirectory¶

This will be where the files are downloaded. It will be a subdirectory inside of a directory called data, and the directory name will be the date that it was created.

In [ ]:

import os
from datetime import datetime

def create_data_directory():
    # Get the current date and time
    # current_datetime = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
    current_datetime = datetime.now().strftime("%m_%d_%Y")

    # Define the base directory
    base_directory = "data"

    # Create the full path for the new directory
    new_directory_path = os.path.join(base_directory, f"data_{current_datetime}")
    # Create the new directory
    os.makedirs(new_directory_path, exist_ok=True)

    print(f"Directory '{new_directory_path}' created successfully.")

    return new_directory_path 

directory_path = create_data_directory()

Download the data¶

The below will download the data to your newly created subdirectory. Look on your file system for a directory /data/date/ where date is the date the directory was created.

In [ ]:

dswx_data = leafmap.nasa_data_download(dswx_results, out_dir=directory_path)     

View the files using Leafmap¶

Load in images from data folder¶

We load in data from only the DSWx-WTR layer below. If you'd like load data from a different layer change the B01 to suit your needs. Included layers:

OPERA_L3_DSWx-HLS_*B01_WTR.tif

OPERA_L3_DSWx-HLS_*B02_BWTR.tif

OPERA_L3_DSWx-HLS_*B03_CONF.tif

OPERA_L3_DSWx-HLS_*B04_DIAG.tif

OPERA_L3_DSWx-HLS_*B05_WTR-1.tif

OPERA_L3_DSWx-HLS_*B06_WTR-2.tif

OPERA_L3_DSWx-HLS_*B07_LAND.tif

OPERA_L3_DSWx-HLS_*B08_SHAD.tif

OPERA_L3_DSWx-HLS_*B09_CLOUD.tif

OPERA_L3_DSWx-HLS_*B10_DEM.tif

In [ ]:

import os

ImageLayer='B01' #B01 corresponds to WTR (see above)

# Get the current directory
current_directory = os.getcwd()

# Construct the path to the data directory
data_directory = os.path.join(current_directory, directory_path)

# Create a list of file paths and a list of corresponding dates
images = [os.path.join(data_directory, filename) for filename in os.listdir(data_directory) if os.path.isfile(os.path.join(data_directory, filename)) and ImageLayer in filename]
image_dates = [image[25:33] for image in os.listdir(data_directory) if ImageLayer in image]

In [ ]:

images

Merge individual tiles into a single image¶

In [ ]:

filename_merged='Merged.tif'
merged_raster = leafmap.merge_rasters(data_directory,os.path.join(data_directory, filename_merged),input_pattern='*' + ImageLayer +'*.tif',output_format='GTiff',output_nodata=None)

Display the merged images¶

*** The Harmonized Landsat Sentinel-2 (HLS) mask layer (Fmask) sometimes misclassifies sediment-laden or white (i.e. turbulent) surface water as Snow/Ice. This is relevant to this Brazil Flood Example.¶

In [ ]:

m = leafmap.Map(basemap="Esri.WorldImagery")
m.add_raster(os.path.join(data_directory, filename_merged), opacity=1,nodata=0)
m.zoom_to_bounds(AOI)

legend_dict = {
    # 'Not Water': '##ffffff',
    'Open Surface Water': '#0000ff',
    'Partial Surface Water': '#b4d5f4',
    'HLS snow/ice mask': '#00ffff',
    'HLS cloud/cloud shadow mask': '#afafaf'
}
# Add the legend to the map
m.add_legend(legend_title="Legend Title", legend_dict=legend_dict)

m

Crop Images to bbox and filter out clouds and no data¶

In [ ]:

with rasterio.open(images[0]) as src:
    # Print raster metadata
    print("Metadata:")
    print(src.meta)

In [ ]:

from rasterio.warp import transform_bounds
from rasterio.crs import CRS

utm_bounds = transform_bounds(CRS.from_epsg(4326), src.crs, *f_bounds)
print(utm_bounds)

In [ ]:

output_dir = os.path.join(data_directory, 'cropped')
os.makedirs(output_dir, exist_ok=True)

In [ ]:

def crop_raster_to_roi(raster_path, utm_bounds, output_path):
    """Crop raster to the region of interest (ROI) and save it."""
    with rasterio.open(raster_path) as src:
        # Define bounding box from ROI
        bbox_geom = box(utm_bounds[0], utm_bounds[1], utm_bounds[2], utm_bounds[3])
        bbox_gdf = gpd.GeoDataFrame({'geometry': [bbox_geom]}, crs=src.crs)

        # Get the raster's transform and CRS
        transform = src.transform
        crs = src.crs

        # Reproject ROI to the raster's CRS
        bbox_gdf = bbox_gdf.to_crs(crs)

        # Mask the raster using the bounding box
        out_image, out_transform = mask(src, bbox_gdf.geometry, crop=True)
        out_meta = src.meta.copy()
        out_meta.update({
            'height': out_image.shape[1],
            'width': out_image.shape[2],
            'transform': out_transform
        })

        #  # Check for NaN values in the cropped image across all bands
        # if np.isnan(out_image).any():
        #     print(f'Skipping {os.path.basename(raster_path)} due to NaN values in the cropped area.')
        #     return

        # Check for NoData values in the cropped image
        nodata = src.nodata
        if nodata is not None:
            total_pixels = out_image.size
            nodata_pixels = (out_image == nodata).sum()
            nodata_percentage = (nodata_pixels / total_pixels) * 100

            if nodata_percentage > 90:
                print(f'Skipping {os.path.basename(raster_path)} due to more than 90% NoData values in the cropped area.')
                return

        # Check for Cloud/Cloud Shadow values in the cropped image
        cloud = 253 #cloud pixel values
        if cloud is not None:
            total_pixels = out_image.size
            cloud_pixels = (out_image == cloud).sum()
            cloud_percentage = (cloud_pixels / total_pixels) * 100

            if cloud_percentage > 50:
                print(f'Skipping {os.path.basename(raster_path)} due to more than 50% clouds in the cropped area.')
                return

        # Save the colormap
        colormap = src.colormap(1) if src.count == 1 else None

        # Save the cropped raster if no NaN values are found
        with rasterio.open(output_path, 'w', **out_meta) as dest:
            dest.write(out_image)
            if colormap:
                dest.write_colormap(1, colormap)

In [ ]:

for raster_file in images:
    file_name = os.path.basename(raster_file)
    output_path = os.path.join(output_dir, file_name)
    crop_raster_to_roi(raster_file, utm_bounds, output_path)
    if os.path.exists(output_path):
        print(f'Cropped and saved {file_name} to {output_path}')
    else:
        print(f'Skipped {file_name}')

print('Processing complete.')

Make a time lapse¶

In [ ]:

m = leafmap.Map(basemap="Esri.WorldImagery")
m.zoom_to_bounds(AOI)

# Define the directory containing your raster files and get a list of file paths
raster_files = sorted([os.path.join(output_dir, f) for f in os.listdir(output_dir) if f.endswith('.tif')])

# Optionally, you can define a dictionary of labels for the time slider
time_labels = {i: f'Image {i+1}' for i in range(len(raster_files))}

# Add the rasters with a time slider
m.add_time_slider(
    raster_files,
    time_interval=.25,  
    time_labels=time_labels,
    label_position="topcenter",
    nodata=0
)

legend_dict = {
    # 'Not Water': '##ffffff',
    'Open Surface Water': '#0000ff',
    'Partial Surface Water': '#b4d5f4',
    'HLS snow/ice mask': '#00ffff',
    'HLS cloud/cloud shadow mask': '#afafaf'
}
# Add the legend to the map
m.add_legend(legend_title="Legend Title", legend_dict=legend_dict)
m.add_gdf(df_aoi,fill_opacity=0, layer_name="AOI")

# Display the map
m