The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the Radiant MLHub site and about the organization behind it at the Radiant Earth Foundation site.
This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at docs.mlhub.earth.
We'll show you how to set up your authentication, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections.
All collections in the Radiant MLHub repository are cataloged using STAC. Collections that include labels/annotations are additionally described using the Label Extension.
Access to the Radiant MLHub API requires an API key. To get your API key, go to mlhub.earth and click the "Sign in / Register" button in the top right to log in. If you have not used Radiant MLHub before, you will need to sign up and create a new account; otherwise, just sign in. Once you have signed in, click on your user avatar in the top right and select the "Settings & API keys" from the dropdown menu.
In the API Keys section of this page, you will be able to create new API key(s). Do not share your API key with others as this may pose a security risk.
Next, we will create a MLHUB_API_KEY
variable that pystac-client
will use later use to add our API key to all requests:
import getpass
MLHUB_API_KEY = getpass.getpass(prompt="MLHub API Key: ")
MLHUB_ROOT_URL = "https://api.radiant.earth/mlhub/v1"
MLHub API Key: ································································
Finally, we connect to the Radiant MLHub API using our API key:
import itertools as it
import requests
import shutil
import tempfile
import os.path
from pprint import pprint
from urllib.parse import urljoin
from pystac_client import Client
from pystac import ExtensionNotImplemented
from pystac.extensions.scientific import ScientificExtension
client = Client.open(
MLHUB_ROOT_URL, parameters={"key": MLHUB_API_KEY}, ignore_conformance=True
)
A dataset in the Radiant MLHub API is a JSON object that represents a group of STAC Collections that belong together. A typical datasets will include 1 Collection of source imagery and 1 Collection of labels, but this is not always the case. Some datasets are comprised of a single Collection with both labels and source imagery, others may contain multiple source imagery or label Collections, and others may contain only labels.
Datasets are not a STAC entity and therefore we must work with them by making direct requests to the API rather than using pystac-client
.
We start by creating a requests.Session
instance so that we can include the API key in all of our requests.
class MLHubSession(requests.Session):
def __init__(self, *args, api_key=None, **kwargs):
super().__init__(*args, **kwargs)
self.params.update({"key": api_key})
def request(self, method, url, *args, **kwargs):
url_prefix = MLHUB_ROOT_URL.rstrip("/") + "/"
url = urljoin(url_prefix, url)
return super().request(method, url, *args, **kwargs)
session = MLHubSession(api_key=MLHUB_API_KEY)
Next, we list the available datasets using the /datasets
endpoint
response = session.get("/datasets")
datasets = response.json()
dataset_limit = 30
print(f"Total Datasets: {len(datasets)}")
print("-----")
for dataset in it.islice(datasets, dataset_limit):
dataset_id = dataset["id"]
dataset_title = dataset["title"] or "No Title"
print(f"{dataset_id}: {dataset_title}")
if len(datasets) > dataset_limit:
print("...")
Total Datasets: 27 ----- idiv_asia_crop_type: A crop type dataset for consistent land cover classification in Central Asia dlr_fusion_competition_germany: A Fusion Dataset for Crop Type Classification in Germany ref_fusion_competition_south_africa: A Fusion Dataset for Crop Type Classification in Western Cape, South Africa bigearthnet_v1: BigEarthNet microsoft_chesapeake: Chesapeake Land Cover ref_african_crops_kenya_02: CV4A Kenya Crop Type Competition ref_african_crops_uganda_01: Dalberg Data Insights Crop Type Uganda rti_rwanda_crop_type: Drone Imagery Classification Training Dataset for Crop Types in Rwanda ref_african_crops_tanzania_01: Great African Food Company Crop Type Tanzania landcovernet_v1: LandCoverNet nasa_marine_debris: Marine Debris Dataset for Object Detection in Planetscope Imagery open_cities_ai_challenge: Open Cities AI Challenge Dataset ref_african_crops_kenya_01: PlantVillage Crop Type Kenya su_african_crops_ghana: Semantic Segmentation of Crop Type in Ghana su_african_crops_south_sudan: Semantic Segmentation of Crop Type in South Sudan sen12floods: SEN12-FLOOD : A SAR and Multispectral Dataset for Flood Detection ts_cashew_benin: Smallholder Cashew Plantations in Benin ref_south_africa_crops_competition_v1: South Africa Crop Type Competition spacenet1: SpaceNet 1 spacenet2: SpaceNet 2 spacenet3: SpaceNet 3 spacenet4: SpaceNet 4 spacenet5: SpaceNet 5 spacenet6: SpaceNet 6 spacenet7: SpaceNet 7 nasa_tropical_storm_competition: Tropical Cyclone Wind Estimation Competition su_sar_moisture_content_main: Western USA Live Fuel Moisture
Let's take a look at the Kenya Crop Type dataset.
crop_dataset = next(
dataset for dataset in datasets if dataset["id"] == "ref_african_crops_kenya_02"
)
pprint(crop_dataset)
{'bbox': {'coordinates': [[[[34.203204542, 0.16702187], [34.203204542, 0.167033875], [34.022068532, 0.167033875], [34.022068532, 0.441545516], [34.022094367, 0.441545516], [34.022094367, 0.716046625], [34.203292802, 0.716046625], [34.203292802, 0.716002363], [34.384429981, 0.716002363], [34.384429981, 0.441486489], [34.38436343, 0.441486489], [34.38436343, 0.16702187], [34.203204542, 0.16702187]]]], 'type': 'MultiPolygon'}, 'citation': 'Radiant Earth Foundation (2020) "CV4A Competition Kenya Crop ' 'Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] ' 'https://doi.org/10.34911/RDNT.DW605X', 'collections': [{'id': 'ref_african_crops_kenya_02_labels', 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)', 'types': ['labels']}, {'id': 'ref_african_crops_kenya_02_source', 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)', 'types': ['source_imagery']}], 'creator_contact': {'contact': 'ml@radiant.earth', 'creator': '[Radiant Earth ' 'Foundation](https://www.radiant.earth/), ' '[PlantVillage](https://plantvillage.psu.edu/)'}, 'date_added': '2020-06-17T00:00:00+00:00', 'date_modified': '2020-06-17T00:00:00+00:00', 'description': 'This dataset was produced as part of the [Crop Type Detection ' 'competition](https://zindi.africa/competitions/iclr-workshop-challenge-2-radiant-earth-computer-vision-for-crop-recognition) ' 'at the [Computer Vision for Agriculture (CV4A) ' 'Workshop](https://www.cv4gc.org/cv4a2020/) at the ICLR 2020 ' 'conference. The objective of the competition was to create a ' 'machine learning model to classify fields by crop type from ' 'images collected during the growing season by the Sentinel-2 ' 'satellites.\n' '<br><br>\n' 'The ground reference data were collected by the PlantVillage ' 'team, and Radiant Earth Foundation curated the training ' 'dataset after inspecting and selecting more than 4,000 fields ' 'from the original ground reference data. The dataset has been ' 'split into training and test sets (3,286 in the train and ' '1,402 in the test).\n' '<br><br>\n' 'The dataset is cataloged in four tiles. These tiles are ' 'smaller than the original Sentinel-2 tile that has been ' 'clipped and chipped to the geographical area that labels have ' 'been collected.\n' '<br><br>\n' 'Each tile has a) 13 multi-band observations throughout the ' 'growing season. Each observation includes 12 bands from ' 'Sentinel-2 L2A product, and a cloud probability layer. The ' 'twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, ' 'B8A, B09, B11, B12]. The cloud probability layer is a product ' 'of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) ' 'and provides an estimated cloud probability (0-100%) per ' 'pixel. All of the bands are mapped to a common 10 m spatial ' 'resolution grid.; b) A raster layer indicating the crop ID ' 'for the fields in the training set; and c) A raster layer ' 'indicating field IDs for the fields (both training and test ' 'sets). Fields with a crop ID of 0 are the test fields.', 'documentation_link': 'https://radiantearth.blob.core.windows.net/mlhub/kenya-crop-challenge/Documentation.pdf', 'doi': '10.34911/rdnt.dw605x', 'id': 'ref_african_crops_kenya_02', 'publications': [{'author_name': 'Hannah Kerner, Catherine Nakalembe and ' 'Inbal Becker-Reshef', 'author_url': None, 'title': 'Field-Level Crop Type Classification with k ' 'Nearest Neighbors: A Baseline for a New Kenya ' 'Smallholder Dataset', 'url': 'https://arxiv.org/abs/2004.03023'}], 'registry': 'https://mlhub.earth/ref_african_crops_kenya_02', 'status': 'ready', 'tags': ['crop type', 'segmentation', 'sentinel-2', 'agriculture'], 'title': 'CV4A Kenya Crop Type Competition', 'tools_applications': None, 'tutorials': [{'author_name': 'Hamed Alemohammad', 'author_url': 'https://www.linkedin.com/in/hamedalemohammad/', 'title': 'A Guide to Access the data on Radiant MLHub', 'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-download-data.ipynb'}, {'author_name': 'Hamed Alemohammad', 'author_url': 'https://www.linkedin.com/in/hamedalemohammad/', 'title': 'A Guide to load and visualize the data in Python', 'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-load-data.ipynb'}, {'author_name': 'Devis Peressutti', 'author_url': 'https://sites.google.com/site/devisperessutti/home', 'title': 'CV4A ICLR 2020 Starter Notebooks', 'url': 'https://github.com/sentinel-hub/cv4a-iclr-2020-starter-notebooks'}]}
We can see that the metadata includes and ID and title, citation information, a bounding box for the dataset, and list of collections included in the dataset. If we take a closer look at the collections
list, we can see that each collection has an id
and a type
. We can use the type
to figure out whether a collection contains labels, source imagery, or both, and we can use the ID to fetch that dataset (see below).
pprint(crop_dataset["collections"])
[{'id': 'ref_african_crops_kenya_02_labels', 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)', 'types': ['labels']}, {'id': 'ref_african_crops_kenya_02_source', 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)', 'types': ['source_imagery']}]
A collection in the Radiant MLHub API is a STAC Collection representing a group of resources (represented as STAC Items and their associated assets) covering a given spatial and temporal extent. A Radiant MLHub collection may contain resources representing training labels, source imagery, or (rarely) both.
Use the client.list_collections
function to list all available collections and view their properties. The following cell uses the client.list_collections
function to print the ID, license (if available), and citation (if available) for the first 20 available collections.
collections = client.get_collections()
for c in it.islice(collections, 20):
collection_id = c.id
license = c.license or "N/A"
try:
sci = ScientificExtension.ext(c)
citation = sci.citation or "N/A"
except ExtensionNotImplemented:
citation = "N/A"
print(f"ID: {collection_id}\nLicense: {license}\nCitation: {citation}\n")
ID: ref_african_crops_uganda_01_source License: CC-BY-SA-4.0 Citation: Bocquet, C., & Dalberg Data Insights. (2019) "Dalberg Data Insights Uganda Crop Classification", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X ID: microsoft_chesapeake_landsat_leaf_on License: CC-PDDC Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019). ID: sen12floods_s2_source License: CC-BY-4.0 Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, "SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection ", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898. ID: sn2_AOI_2_Vegas License: CC-BY-SA-4.0 Citation: N/A ID: sn4_AOI_6_Atlanta License: CC-BY-SA-4.0 Citation: N/A ID: su_sar_moisture_content License: CC-BY-NC-ND-4.0 Citation: Rao, K., Williams, A.P., Fortin, J. & Konings, A.G. (2020). SAR-enhanced mapping of live fuel moisture content. Remote Sens. Environ., 245. ID: ref_african_crops_tanzania_01_source License: CC-BY-SA-4.0 Citation: Great African Food Company (2019) "Great African Food Company Tanzania Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.5VX40R ID: ref_south_africa_crops_competition_v1_train_source_s2 License: CC-BY-4.0 Citation: Western Cape Department of Agriculture, Radiant Earth Foundation (2021) "Crop Type Classification Dataset for Western Cape, South Africa", Version 1.0, Radiant MLHub, [Date Accessed] https://doi.org/10.34911/rdnt.j0co8q ID: sn2_AOI_4_Shanghai License: CC-BY-SA-4.0 Citation: N/A ID: sn3_AOI_3_Paris License: CC-BY-SA-4.0 Citation: N/A ID: sn3_AOI_4_Shanghai License: CC-BY-SA-4.0 Citation: N/A ID: sn5_AOI_8_Mumbai License: CC-BY-SA-4.0 Citation: N/A ID: ref_african_crops_uganda_01_labels License: CC-BY-SA-4.0 Citation: Bocquet, C., & Dalberg Data Insights. (2019) "Dalberg Data Insights Uganda Crop Classification", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X ID: nasa_tropical_storm_competition_train_labels License: CC-BY-4.0 Citation: M. Maskey, R. Ramachandran, I. Gurung, B. Freitag, M. Ramasubramanian, J. Miller"Tropical Cyclone Wind Estimation Competition Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.xs53up ID: sen12floods_s1_labels License: CC-BY-4.0 Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, "SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection ", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898. ID: microsoft_chesapeake_landsat_leaf_off License: CC-PDDC Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019). ID: sn3_AOI_2_Vegas License: CC-BY-SA-4.0 Citation: N/A ID: sn7_test_source License: CC-BY-SA-4.0 Citation: N/A ID: su_african_crops_ghana_source_s1 License: CC-BY-SA-4.0 Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) "Semantic Segmentation of Crop Type in Ghana Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.ry138p ID: su_african_crops_south_sudan_source_planet License: CC-BY-SA-4.0 Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) "Semantic Segmentation of Crop Type in South Sudan Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.v6kx6n
Collection objects have many other properties besides the ones shown above. The cell below prints the ref_african_crops_kenya_01_labels
collection object in its entirety.
kenya_crops_labels = next(
c for c in collections if c.id == "ref_african_crops_kenya_01_labels"
)
kenya_crops_labels.to_dict()
{'type': 'Collection', 'id': 'ref_african_crops_kenya_01_labels', 'stac_version': '1.0.0', 'description': 'African Crops Kenya', 'links': [{'rel': 'items', 'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels/items', 'type': 'application/geo+json'}, {'rel': 'parent', 'href': 'http://api.radiant.earth/mlhub/v1/', 'type': 'application/json'}, {'rel': <RelType.ROOT: 'root'>, 'href': 'https://api.radiant.earth/mlhub/v1', 'type': <MediaType.JSON: 'application/json'>, 'title': 'Radiant MLHub API'}, {'rel': 'self', 'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels', 'type': 'application/json'}], 'stac_extensions': ['https://stac-extensions.github.io/scientific/v1.0.0/schema.json'], 'sci:doi': '10.34911/rdnt.u41j87', 'providers': [{'name': 'Radiant Earth Foundation', 'roles': ['licensor', 'host', 'processor'], 'url': 'https://radiant.earth'}], 'sci:citation': 'PlantVillage (2019) "PlantVillage Kenya Ground Reference Crop Type Dataset", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.U41J87', 'extent': {'spatial': {'bbox': [[34.18191992149459, 0.4724181558451209, 34.3714943155646, 0.7144217206851109]]}, 'temporal': {'interval': [['2018-04-10T00:00:00Z', '2020-03-13T00:00:00Z']]}}, 'license': 'CC-BY-SA-4.0'}
A typical workflow for downloading assets from a STAC Catalog would involve looping through all Items and downloading the associated assets. However, the ML training datasets published through Radiant MLHub can sometimes have thousands or hundreds of thousands of Items, making this workflow be very time-consuming for larger datasets. For faster access to the assets for an entire dataset, MLHub provides TAR archives of all collections that can be downloaded using the /archive/{collection_id}
endpoint.
We will use the MLHubSession
instance we created above to ensure that our API key is sent with each request.
# Create a temporary directory
tmp_dir = tempfile.mkdtemp()
archive_path = os.path.join(tmp_dir, "ref_african_crops_kenya_01_labels.tar.gz")
# Fetch the archive and save to disk
response = session.get(
"/archive/ref_african_crops_kenya_01_labels", allow_redirects=True
)
with open(archive_path, "wb") as dst:
dst.write(response.content)
Finally, we clean up the temporary directory
shutil.rmtree(tmp_dir)
This tutorial was a quick introduction to working with the Radiant MLHub API in a notebook. For more, see: