Accessing Landsat 8 Collection 2 Level 2 data with the Planetary Computer STAC API¶

The Landsat program has been imaging the Earth since 1972; it provides a comprehensive, continuous archive of the Earth's surface.

This dataset represents the global archive of Level-2 Landsat 8 data from Landsat Collection 2. Images are stored in cloud-optimized GeoTIFF format.

This notebook demonstrates the use of the Planetary Computer STAC API to query for Landsat 8 data.

Environment setup¶

This notebook works with or without an API key, but you will be given more permissive access to the data with an API key. The Planetary Computer Hub is pre-configured to use your API key.

In [34]:

from pystac_client import Client
from pystac.extensions.eo import EOExtension as eo
import planetary_computer as pc

# Set the environment variable PC_SDK_SUBSCRIPTION_KEY, or set it here.
# The Hub sets PC_SDK_SUBSCRIPTION_KEY automatically.
# pc.settings.set_subscription_key(<YOUR API Key>)

Choose a region and time of interest¶

This area is in Redmond, WA, USA, near Microsoft's main campus.

In [35]:

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-122.27508544921875, 47.54687159892238],
            [-121.96128845214844, 47.54687159892238],
            [-121.96128845214844, 47.745787772920934],
            [-122.27508544921875, 47.745787772920934],
            [-122.27508544921875, 47.54687159892238],
        ]
    ],
}

We'll search all of 2020 for the least cloudy image of our area.

In [36]:

time_of_interest = "2020-01-01/2020-12-31"

Search the collection and choose a scene to render¶

Use pystac-client to perform the search over the Landsat 8 Collection 2 Level 2 collection, specifying we want results with less than 10% cloud cover:

In [37]:

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(
    collections=["landsat-8-c2-l2"],
    intersects=area_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

# Check how many items were returned
items = list(search.get_items())
print(f"Returned {len(items)} Items")

Returned 10 Items

We can now work directly with the PySTAC Items returned by the API. Here we find the least cloudy of the bunch.

In [38]:

selected_item = sorted(items, key=lambda item: eo.ext(item).cloud_cover)[0]

print(
    f"Choosing {selected_item.id} from {selected_item.datetime.date()}"
    + f" with {selected_item.properties['eo:cloud_cover']}% cloud cover"
)

Choosing LC08_L2SP_046027_20200908_02_T1 from 2020-09-08 with 0.19% cloud cover

Choose bands from that scene for composite rendering¶

Here we use the common name of STAC's eo extension to choose the red, green, and blue bands to render.

In [45]:

def find_asset_by_band_common_name(item, common_name):
    for asset in item.assets.values():
        asset_bands = eo.ext(asset).bands
        if asset_bands and asset_bands[0].common_name == common_name:
            return asset
    raise KeyError(f"{common_name} band not found")

In [44]:

asset_hrefs = [
    find_asset_by_band_common_name(selected_item, "red").href,
    find_asset_by_band_common_name(selected_item, "green").href,
    find_asset_by_band_common_name(selected_item, "blue").href,
]

This HREF is a URL is the location of the asset data on Azure Blob Storage. In order to read the data, we'll need to retrieve a Shared Access Signature and supply it as a query parameter. These tokens are generated from the Planetary Computer Data Access API.

We use the planetary-computer package to "sign" our asset HREF with a generated token:

In [46]:

signed_hrefs = [pc.sign(asset_href) for asset_href in asset_hrefs]

Render our AOI¶

We can now use the HREFs to read our data in any tools that can retrieve data from URLs via HTTP GET operations.

Here we use rasterio to render the image data over our area of interest.

In [47]:

import rasterio
from rasterio import windows
from rasterio import features
from rasterio import warp

import numpy as np
from PIL import Image


def read_band(href):
    with rasterio.open(href) as ds:
        aoi_bounds = features.bounds(area_of_interest)
        warped_aoi_bounds = warp.transform_bounds("epsg:4326", ds.crs, *aoi_bounds)
        aoi_window = windows.from_bounds(transform=ds.transform, *warped_aoi_bounds)
        return ds.read(1, window=aoi_window)


bands = [read_band(href) for href in signed_hrefs]

The code above reads the Cloud Optimized GeoTIFF data for each of the red, green, and blue bands. The band data is stored in separate images; we can use numpy's stack method to turn them into the equivalent of a multiband raster:

In [48]:

multiband_data = np.stack(bands)

The data is in uint16, which PIL will not render as-is. This code rescales the image to values between 0-255, changes the data type and renders our image:

In [49]:

rescaled = multiband_data.astype(float)
min_value, max_value = rescaled.min(), rescaled.max()
rescaled = ((rescaled - min_value) * 255) / (max_value - min_value)
byte_data = rescaled.astype("ubyte")
Image.fromarray(np.transpose(byte_data, axes=[1, 2, 0]))

Out[49]:

Render an NDVI image¶

Landsat has several bands, and with them we can go beyond rendering RGB imagery; for example, the following code computes a Normalized Difference Vegetation Index (NDVI) using the near-infrared and red bands:

In [50]:

r = read_band(
    pc.sign(find_asset_by_band_common_name(selected_item, "red").href)
).astype(float)

nir = read_band(
    pc.sign(find_asset_by_band_common_name(selected_item, "nir08").href)
).astype(float)

ndvi = (nir - r) / (nir + r)
w = ndvi.shape[0]
h = ndvi.shape[1]

In [51]:

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

dpi = 50
fig = figure(figsize=(w / dpi, h / dpi), dpi=dpi, frameon=False)
ax = plt.Axes(fig, [0.0, 0.0, 1.0, 1.0])
ax.set_axis_off()
fig.add_axes(ax)

plt.imshow(ndvi, cmap="viridis");

In [ ]: