Notebook

Using the Radiant MLHub API¶

The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the Radiant MLHub site and about the organization behind it at the Radiant Earth Foundation site.

This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at docs.mlhub.earth.

We'll show you how to set up your authentication, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections.

All collections in the Radiant MLHub repository are cataloged using STAC. Collections that include labels/annotations are additionally described using the Label Extension.

Authentication¶

Access to the Radiant MLHub API requires an API key. To get your API key, go to mlhub.earth and click the "Sign in / Register" button in the top right to log in. If you have not used Radiant MLHub before, you will need to sign up and create a new account; otherwise, just sign in. Once you have signed in, click on your user avatar in the top right and select the "Settings & API keys" from the dropdown menu.

In the API Keys section of this page, you will be able to create new API key(s). Do not share your API key with others as this may pose a security risk.

Next, we will create a MLHUB_API_KEY variable that pystac-client will use later use to add our API key to all requests:

In [1]:

import getpass

MLHUB_API_KEY = getpass.getpass(prompt="MLHub API Key: ")
MLHUB_ROOT_URL = "https://api.radiant.earth/mlhub/v1"

MLHub API Key:  ································································

Finally, we connect to the Radiant MLHub API using our API key:

In [2]:

import itertools as it
import requests
import shutil
import tempfile
import os.path
from pprint import pprint
from urllib.parse import urljoin

from pystac_client import Client
from pystac import ExtensionNotImplemented
from pystac.extensions.scientific import ScientificExtension

client = Client.open(
    MLHUB_ROOT_URL, parameters={"key": MLHUB_API_KEY}, ignore_conformance=True
)

List datasets¶

A dataset in the Radiant MLHub API is a JSON object that represents a group of STAC Collections that belong together. A typical datasets will include 1 Collection of source imagery and 1 Collection of labels, but this is not always the case. Some datasets are comprised of a single Collection with both labels and source imagery, others may contain multiple source imagery or label Collections, and others may contain only labels.

Datasets are not a STAC entity and therefore we must work with them by making direct requests to the API rather than using pystac-client.

We start by creating a requests.Session instance so that we can include the API key in all of our requests.

In [3]:

class MLHubSession(requests.Session):
    def __init__(self, *args, api_key=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.params.update({"key": api_key})

    def request(self, method, url, *args, **kwargs):
        url_prefix = MLHUB_ROOT_URL.rstrip("/") + "/"
        url = urljoin(url_prefix, url)
        return super().request(method, url, *args, **kwargs)


session = MLHubSession(api_key=MLHUB_API_KEY)

Next, we list the available datasets using the /datasets endpoint

In [4]:

response = session.get("/datasets")
datasets = response.json()

dataset_limit = 30

print(f"Total Datasets: {len(datasets)}")
print("-----")
for dataset in it.islice(datasets, dataset_limit):
    dataset_id = dataset["id"]
    dataset_title = dataset["title"] or "No Title"
    print(f"{dataset_id}: {dataset_title}")
if len(datasets) > dataset_limit:
    print("...")

Total Datasets: 27
-----
idiv_asia_crop_type: A crop type dataset for consistent land cover classification in Central Asia
dlr_fusion_competition_germany: A Fusion Dataset for Crop Type Classification in Germany
ref_fusion_competition_south_africa: A Fusion Dataset for Crop Type Classification in Western Cape, South Africa
bigearthnet_v1: BigEarthNet
microsoft_chesapeake: Chesapeake Land Cover
ref_african_crops_kenya_02: CV4A Kenya Crop Type Competition
ref_african_crops_uganda_01: Dalberg Data Insights Crop Type Uganda
rti_rwanda_crop_type: Drone Imagery Classification Training Dataset for Crop Types in Rwanda
ref_african_crops_tanzania_01: Great African Food Company Crop Type Tanzania
landcovernet_v1: LandCoverNet
nasa_marine_debris: Marine Debris Dataset for Object Detection in Planetscope Imagery
open_cities_ai_challenge: Open Cities AI Challenge Dataset
ref_african_crops_kenya_01: PlantVillage Crop Type Kenya
su_african_crops_ghana: Semantic Segmentation of Crop Type in Ghana
su_african_crops_south_sudan: Semantic Segmentation of Crop Type in South Sudan
sen12floods: SEN12-FLOOD : A SAR and Multispectral Dataset for Flood Detection
ts_cashew_benin: Smallholder Cashew Plantations in Benin
ref_south_africa_crops_competition_v1: South Africa Crop Type Competition
spacenet1: SpaceNet 1
spacenet2: SpaceNet 2
spacenet3: SpaceNet 3
spacenet4: SpaceNet 4
spacenet5: SpaceNet 5
spacenet6: SpaceNet 6
spacenet7: SpaceNet 7
nasa_tropical_storm_competition: Tropical Cyclone Wind Estimation Competition
su_sar_moisture_content_main: Western USA Live Fuel Moisture

Let's take a look at the Kenya Crop Type dataset.

In [5]:

crop_dataset = next(
    dataset for dataset in datasets if dataset["id"] == "ref_african_crops_kenya_02"
)
pprint(crop_dataset)

{'bbox': {'coordinates': [[[[34.203204542, 0.16702187],
                            [34.203204542, 0.167033875],
                            [34.022068532, 0.167033875],
                            [34.022068532, 0.441545516],
                            [34.022094367, 0.441545516],
                            [34.022094367, 0.716046625],
                            [34.203292802, 0.716046625],
                            [34.203292802, 0.716002363],
                            [34.384429981, 0.716002363],
                            [34.384429981, 0.441486489],
                            [34.38436343, 0.441486489],
                            [34.38436343, 0.16702187],
                            [34.203204542, 0.16702187]]]],
          'type': 'MultiPolygon'},
 'citation': 'Radiant Earth Foundation (2020) "CV4A Competition Kenya Crop '
             'Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] '
             'https://doi.org/10.34911/RDNT.DW605X',
 'collections': [{'id': 'ref_african_crops_kenya_02_labels',
                  'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',
                  'types': ['labels']},
                 {'id': 'ref_african_crops_kenya_02_source',
                  'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',
                  'types': ['source_imagery']}],
 'creator_contact': {'contact': 'ml@radiant.earth',
                     'creator': '[Radiant Earth '
                                'Foundation](https://www.radiant.earth/), '
                                '[PlantVillage](https://plantvillage.psu.edu/)'},
 'date_added': '2020-06-17T00:00:00+00:00',
 'date_modified': '2020-06-17T00:00:00+00:00',
 'description': 'This dataset was produced as part of the [Crop Type Detection '
                'competition](https://zindi.africa/competitions/iclr-workshop-challenge-2-radiant-earth-computer-vision-for-crop-recognition) '
                'at the [Computer Vision for Agriculture (CV4A) '
                'Workshop](https://www.cv4gc.org/cv4a2020/) at the ICLR 2020 '
                'conference. The objective of the competition was to create a '
                'machine learning model to classify fields by crop type from '
                'images collected during the growing season by the Sentinel-2 '
                'satellites.\n'
                '<br><br>\n'
                'The ground reference data were collected by the PlantVillage '
                'team, and Radiant Earth Foundation curated the training '
                'dataset after inspecting and selecting more than 4,000 fields '
                'from the original ground reference data. The dataset has been '
                'split into training and test sets (3,286 in the train and '
                '1,402 in the test).\n'
                '<br><br>\n'
                'The dataset is cataloged in four tiles. These tiles are '
                'smaller than the original Sentinel-2 tile that has been '
                'clipped and chipped to the geographical area that labels have '
                'been collected.\n'
                '<br><br>\n'
                'Each tile has a) 13 multi-band observations throughout the '
                'growing season. Each observation includes 12 bands from '
                'Sentinel-2 L2A product, and a cloud probability layer. The '
                'twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, '
                'B8A, B09, B11, B12]. The cloud probability layer is a product '
                'of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) '
                'and provides an estimated cloud probability (0-100%) per '
                'pixel. All of the bands are mapped to a common 10 m spatial '
                'resolution grid.; b) A raster layer indicating the crop ID '
                'for the fields in the training set; and c) A raster layer '
                'indicating field IDs for the fields (both training and test '
                'sets). Fields with a crop ID of 0 are the test fields.',
 'documentation_link': 'https://radiantearth.blob.core.windows.net/mlhub/kenya-crop-challenge/Documentation.pdf',
 'doi': '10.34911/rdnt.dw605x',
 'id': 'ref_african_crops_kenya_02',
 'publications': [{'author_name': 'Hannah Kerner, Catherine Nakalembe and '
                                  'Inbal Becker-Reshef',
                   'author_url': None,
                   'title': 'Field-Level Crop Type Classification with k '
                            'Nearest Neighbors: A Baseline for a New Kenya '
                            'Smallholder Dataset',
                   'url': 'https://arxiv.org/abs/2004.03023'}],
 'registry': 'https://mlhub.earth/ref_african_crops_kenya_02',
 'status': 'ready',
 'tags': ['crop type', 'segmentation', 'sentinel-2', 'agriculture'],
 'title': 'CV4A Kenya Crop Type Competition',
 'tools_applications': None,
 'tutorials': [{'author_name': 'Hamed Alemohammad',
                'author_url': 'https://www.linkedin.com/in/hamedalemohammad/',
                'title': 'A Guide to Access the data on Radiant MLHub',
                'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-download-data.ipynb'},
               {'author_name': 'Hamed Alemohammad',
                'author_url': 'https://www.linkedin.com/in/hamedalemohammad/',
                'title': 'A Guide to load and visualize the data in Python',
                'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-load-data.ipynb'},
               {'author_name': 'Devis Peressutti',
                'author_url': 'https://sites.google.com/site/devisperessutti/home',
                'title': 'CV4A ICLR 2020 Starter Notebooks',
                'url': 'https://github.com/sentinel-hub/cv4a-iclr-2020-starter-notebooks'}]}

We can see that the metadata includes and ID and title, citation information, a bounding box for the dataset, and list of collections included in the dataset. If we take a closer look at the collections list, we can see that each collection has an id and a type. We can use the type to figure out whether a collection contains labels, source imagery, or both, and we can use the ID to fetch that dataset (see below).

In [6]:

pprint(crop_dataset["collections"])

[{'id': 'ref_african_crops_kenya_02_labels',
  'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',
  'types': ['labels']},
 {'id': 'ref_african_crops_kenya_02_source',
  'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',
  'types': ['source_imagery']}]

List data collections¶

A collection in the Radiant MLHub API is a STAC Collection representing a group of resources (represented as STAC Items and their associated assets) covering a given spatial and temporal extent. A Radiant MLHub collection may contain resources representing training labels, source imagery, or (rarely) both.

Use the client.list_collections function to list all available collections and view their properties. The following cell uses the client.list_collections function to print the ID, license (if available), and citation (if available) for the first 20 available collections.

In [7]:

collections = client.get_collections()
for c in it.islice(collections, 20):
    collection_id = c.id
    license = c.license or "N/A"
    try:
        sci = ScientificExtension.ext(c)
        citation = sci.citation or "N/A"
    except ExtensionNotImplemented:
        citation = "N/A"

    print(f"ID:       {collection_id}\nLicense:  {license}\nCitation: {citation}\n")

ID:       ref_african_crops_uganda_01_source
License:  CC-BY-SA-4.0
Citation: Bocquet, C., & Dalberg Data Insights. (2019) "Dalberg Data Insights Uganda Crop Classification", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X

ID:       microsoft_chesapeake_landsat_leaf_on
License:  CC-PDDC
Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).

ID:       sen12floods_s2_source
License:  CC-BY-4.0
Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, "SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection ", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898.

ID:       sn2_AOI_2_Vegas
License:  CC-BY-SA-4.0
Citation: N/A

ID:       sn4_AOI_6_Atlanta
License:  CC-BY-SA-4.0
Citation: N/A

ID:       su_sar_moisture_content
License:  CC-BY-NC-ND-4.0
Citation: Rao, K., Williams, A.P., Fortin, J. & Konings, A.G. (2020). SAR-enhanced mapping of live fuel moisture content. Remote Sens. Environ., 245.

ID:       ref_african_crops_tanzania_01_source
License:  CC-BY-SA-4.0
Citation: Great African Food Company (2019) "Great African Food Company Tanzania Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.5VX40R

ID:       ref_south_africa_crops_competition_v1_train_source_s2
License:  CC-BY-4.0
Citation: Western Cape Department of Agriculture, Radiant Earth Foundation (2021) "Crop Type Classification Dataset for Western Cape, South Africa", Version 1.0, Radiant MLHub, [Date Accessed] https://doi.org/10.34911/rdnt.j0co8q

ID:       sn2_AOI_4_Shanghai
License:  CC-BY-SA-4.0
Citation: N/A

ID:       sn3_AOI_3_Paris
License:  CC-BY-SA-4.0
Citation: N/A

ID:       sn3_AOI_4_Shanghai
License:  CC-BY-SA-4.0
Citation: N/A

ID:       sn5_AOI_8_Mumbai
License:  CC-BY-SA-4.0
Citation: N/A

ID:       ref_african_crops_uganda_01_labels
License:  CC-BY-SA-4.0
Citation: Bocquet, C., & Dalberg Data Insights. (2019) "Dalberg Data Insights Uganda Crop Classification", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X

ID:       nasa_tropical_storm_competition_train_labels
License:  CC-BY-4.0
Citation: M. Maskey, R. Ramachandran, I. Gurung, B. Freitag, M. Ramasubramanian, J. Miller"Tropical Cyclone Wind Estimation Competition Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.xs53up

ID:       sen12floods_s1_labels
License:  CC-BY-4.0
Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, "SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection ", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898.

ID:       microsoft_chesapeake_landsat_leaf_off
License:  CC-PDDC
Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).

ID:       sn3_AOI_2_Vegas
License:  CC-BY-SA-4.0
Citation: N/A

ID:       sn7_test_source
License:  CC-BY-SA-4.0
Citation: N/A

ID:       su_african_crops_ghana_source_s1
License:  CC-BY-SA-4.0
Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) "Semantic Segmentation of Crop Type in Ghana Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.ry138p

ID:       su_african_crops_south_sudan_source_planet
License:  CC-BY-SA-4.0
Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) "Semantic Segmentation of Crop Type in South Sudan Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.v6kx6n

Collection objects have many other properties besides the ones shown above. The cell below prints the ref_african_crops_kenya_01_labels collection object in its entirety.

In [8]:

kenya_crops_labels = next(
    c for c in collections if c.id == "ref_african_crops_kenya_01_labels"
)
kenya_crops_labels.to_dict()

Out[8]:

{'type': 'Collection',
 'id': 'ref_african_crops_kenya_01_labels',
 'stac_version': '1.0.0',
 'description': 'African Crops Kenya',
 'links': [{'rel': 'items',
   'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels/items',
   'type': 'application/geo+json'},
  {'rel': 'parent',
   'href': 'http://api.radiant.earth/mlhub/v1/',
   'type': 'application/json'},
  {'rel': <RelType.ROOT: 'root'>,
   'href': 'https://api.radiant.earth/mlhub/v1',
   'type': <MediaType.JSON: 'application/json'>,
   'title': 'Radiant MLHub API'},
  {'rel': 'self',
   'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels',
   'type': 'application/json'}],
 'stac_extensions': ['https://stac-extensions.github.io/scientific/v1.0.0/schema.json'],
 'sci:doi': '10.34911/rdnt.u41j87',
 'providers': [{'name': 'Radiant Earth Foundation',
   'roles': ['licensor', 'host', 'processor'],
   'url': 'https://radiant.earth'}],
 'sci:citation': 'PlantVillage (2019) "PlantVillage Kenya Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.U41J87',
 'extent': {'spatial': {'bbox': [[34.18191992149459,
     0.4724181558451209,
     34.3714943155646,
     0.7144217206851109]]},
  'temporal': {'interval': [['2018-04-10T00:00:00Z',
     '2020-03-13T00:00:00Z']]}},
 'license': 'CC-BY-SA-4.0'}

Download Data Archives¶

A typical workflow for downloading assets from a STAC Catalog would involve looping through all Items and downloading the associated assets. However, the ML training datasets published through Radiant MLHub can sometimes have thousands or hundreds of thousands of Items, making this workflow be very time-consuming for larger datasets. For faster access to the assets for an entire dataset, MLHub provides TAR archives of all collections that can be downloaded using the /archive/{collection_id} endpoint.

We will use the MLHubSession instance we created above to ensure that our API key is sent with each request.

In [9]:

# Create a temporary directory
tmp_dir = tempfile.mkdtemp()
archive_path = os.path.join(tmp_dir, "ref_african_crops_kenya_01_labels.tar.gz")

# Fetch the archive and save to disk
response = session.get(
    "/archive/ref_african_crops_kenya_01_labels", allow_redirects=True
)
with open(archive_path, "wb") as dst:
    dst.write(response.content)

Finally, we clean up the temporary directory

In [10]:

shutil.rmtree(tmp_dir)

Next Steps¶

This tutorial was a quick introduction to working with the Radiant MLHub API in a notebook. For more, see: