Accessing MODIS data on Azure

MODIS provides Earth observation data in a wide spectral range, from 1999 to the present. The MODIS satellites image the Earth every one to two days, though individual products derived from MODIS data may have lower temporal resolutions. MODIS is administered by the National Aeronautics and Space Administration (NASA) and the US Geological Survey (USGS).

This dataset is stored in the East US Azure region, so this notebook will run most efficiently on Azure compute located in the same region. If you are using this data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

This notebook demonstrates access to MODIS surface reflectance data on Azure, but numerous other MODIS products are hosted on Azure and documented at aka.ms/ai4edata-modis.

Environment setup

In [6]:
import tempfile
import wget
import numpy as np
import matplotlib.pyplot as plt
import os

from azure.storage.blob import ContainerClient

modis_account_name = 'modissa'
modis_container_name = 'modis-006'
modis_account_url = 'https://' + modis_account_name + '.blob.core.windows.net/'
modis_blob_root = modis_account_url + modis_container_name + '/'

# This file is provided by NASA; it indicates the lat/lon extents of each
# MODIS tile.
#
# The file originally comes from:
#
# https://modis-land.gsfc.nasa.gov/pdf/sn_bound_10deg.txt
modis_tile_extents_url = modis_blob_root + 'sn_bound_10deg.txt'

temp_dir = os.path.join(tempfile.gettempdir(),'modis')
os.makedirs(temp_dir,exist_ok=True)
fn = os.path.join(temp_dir,modis_tile_extents_url.split('/')[-1])
wget.download(modis_tile_extents_url, fn)

# Load this file into a table, where each row is (v,h,lonmin,lonmax,latmin,latmax)
modis_tile_extents = np.genfromtxt(fn,
                     skip_header = 7, 
                     skip_footer = 3)

modis_container_client = ContainerClient(account_url=modis_account_url, 
                                         container_name=modis_container_name,
                                         credential=None)

Functions

In [2]:
def lat_lon_to_modis_tile(lat,lon):
    """
    Get the modis tile indices (h,v) for a given lat/lon
    
    https://www.earthdatascience.org/tutorials/convert-modis-tile-to-lat-lon/
    """
    
    found_matching_tile = False
    i = 0
    while(not found_matching_tile):
        found_matching_tile = lat >= modis_tile_extents[i, 4] \
        and lat <= modis_tile_extents[i, 5] \
        and lon >= modis_tile_extents[i, 2] and lon <= modis_tile_extents[i, 3]
        i += 1
        
    v = int(modis_tile_extents[i-1, 0])
    h = int(modis_tile_extents[i-1, 1])
    
    return h,v


def list_blobs_in_folder(container_name,folder_name):
    """
    List all blobs in a virtual folder in an Azure blob container
    """
    
    files = []
    generator = modis_container_client.list_blobs(name_starts_with=folder_name)
    for blob in generator:
        files.append(blob.name)
    return files
        
    
def list_hdf_blobs_in_folder(container_name,folder_name):
    """"
    List .hdf files in a folder
    """
    
    files = list_blobs_in_folder(container_name,folder_name)
    files = [fn for fn in files if fn.endswith('.hdf')]
    return files             

Access and plot a MODIS tile

In [3]:
# Files are stored according to:
#
# http://modissa.blob.core.windows.net/modis-006/[product]/[htile]/[vtile]/[year][day]/filename

# This is the MODIS surface reflectance product
product = 'MCD43A4'

# Let's look at the tile containing Chicago, IL, on May 15, 2019 (day of year 135)
h,v = lat_lon_to_modis_tile(41.881832,-87.623177)
daynum = '2019135'
folder = product + '/' + '{:0>2d}/{:0>2d}'.format(h,v) + '/' + daynum

# Find all HDF files from this tile on this day
filenames = list_hdf_blobs_in_folder(modis_container_name,folder)
print('Found {} matching file(s):'.format(len(filenames)))
for fn in filenames:
    print(fn)

# Work with the first returned URL
blob_name = filenames[0]

# Download to a temporary file
url = modis_blob_root + blob_name

filename = os.path.join(temp_dir,blob_name.replace('/','_'))
if not os.path.isfile(filename):
    wget.download(url,filename)
Found 1 matching file(s):
MCD43A4/11/04/2019135/MCD43A4.A2019135.h11v04.006.2019149220457.hdf

Open the file as an xarray Dataset

In [4]:
import rioxarray as rxr
ds = rxr.open_rasterio(filename)
list(ds.keys())
Out[4]:
['BRDF_Albedo_Band_Mandatory_Quality_Band1',
 'Nadir_Reflectance_Band3',
 'Nadir_Reflectance_Band4',
 'Nadir_Reflectance_Band5',
 'Nadir_Reflectance_Band6',
 'Nadir_Reflectance_Band7',
 'BRDF_Albedo_Band_Mandatory_Quality_Band2',
 'BRDF_Albedo_Band_Mandatory_Quality_Band3',
 'BRDF_Albedo_Band_Mandatory_Quality_Band4',
 'BRDF_Albedo_Band_Mandatory_Quality_Band5',
 'BRDF_Albedo_Band_Mandatory_Quality_Band6',
 'BRDF_Albedo_Band_Mandatory_Quality_Band7',
 'Nadir_Reflectance_Band1',
 'Nadir_Reflectance_Band2']

Plot a composite image

In [5]:
norm_value = 5000
r = ds['Nadir_Reflectance_Band1'].values.squeeze() / norm_value
g = ds['Nadir_Reflectance_Band4'].values.squeeze() / norm_value
b = ds['Nadir_Reflectance_Band3'].values.squeeze() / norm_value
rgb = np.dstack((r,g,b))

np.clip(rgb,0,1,rgb)
fig = plt.figure(frameon=False); ax = plt.Axes(fig,[0., 0., 1., 1.])
ax.set_axis_off(); fig.add_axes(ax)
plt.imshow(rgb);