This example illustrates the how to read and export data for a specific region / dates using the HydroMT DataCatalog and the export_data
method.
# import hydromt and setup logging
import os
import hydromt
from hydromt.log import setuplog
logger = setuplog("export data", log_level=10)
For this exercise, we will use the pre-defined catalog artifact_data
which contains a global data extracts for the Piave basin in Northern Italy. This data catalog and the actual data linked to it are for a small geographic extent as it is intended for documentation and testing purposes only. If you have another data catalog available (and the linked data), you can use it instead.
To read your own data catalog (as well as a predefined catalog), you can use the data_libs argument of the DataCatalog
which accepts either a absolute/relative path to a data catalog yaml file or a name of a pre-defined catalog.
First let's read the pre-defined artifact data catalog:
# Download and read artifacts for the Piave basin to `~/.hydromt_data/`.
data_catalog = hydromt.DataCatalog(
logger=logger,
data_libs=["artifact_data"],
)
The artifact_data
catalog is one of the pre-defined available DataCatalog of HydroMT. You can find an overview of pre-defined data catalogs in the online user guide. You can also get an overview of the pre-defined catalogs with their version number from HydroMT.
from pprint import pprint
pprint(data_catalog.predefined_catalogs)
Let's now check which data sources are available in the catalog:
# For a list of sources including attributes
data_catalog.sources.keys()
And let's now open a plot one of the available datasets to check extent and available dates:
ds = data_catalog.get_rasterdataset("era5", time_tuple=("2010-02-02", "2010-02-15"))
print("")
print(f"Available extent: {ds.raster.bounds}")
print(f"Available dates: {ds.time.values[0]} to {ds.time.values[-1]}")
ds
Now we will export a subset of the data in our artifact_data
catalog using the DataCatalog.export_data method. Let's check the method's docstring:
?data_catalog.export_data
Let's select which data source and the extent we want (based on the exploration above):
# List of data sources to export
# NOTE that for ERA5 we only export the precip variable and for merit_hydro we only export the elevtn variable
source_list = ["merit_hydro[elevtn,flwdir]", "era5[precip]", "vito"]
# Geographic extent
bbox = [12.0, 46.0, 13.0, 46.5]
# Time extent
time_tuple = ("2010-02-10", "2010-02-15")
And let's export the tmp_data_export folder:
folder_name = "tmp_data_export"
data_catalog.export_data(
data_root=folder_name,
bbox=bbox,
time_tuple=time_tuple,
source_names=source_list,
meta={"version": "1"},
)
Now we have our new extracted data and HydroMT saved as well a new data catalog file that goes with it:
import os
for path, _, files in os.walk(folder_name):
print(path)
for name in files:
print(f" - {name}")
with open(os.path.join(folder_name, "data_catalog.yml"), "r") as f:
print(f.read())
Let's open the extracted data catalog:
data_catalog_extract = hydromt.DataCatalog(
data_libs=os.path.join(folder_name, "data_catalog.yml")
)
data_catalog_extract.sources.keys()
And now let's open the extracted data again and do a nice plot.
# Get both the extracted and original merit_hydro_1k DEM
dem = data_catalog.get_rasterdataset(
"merit_hydro", variables=["elevtn"], bbox=[11.6, 45.2, 13.0, 46.8]
)
dem_extract = data_catalog_extract.get_rasterdataset(
"merit_hydro", variables=["elevtn"]
)
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import box
import cartopy.crs as ccrs
proj = ccrs.PlateCarree() # plot projection
# get bounding box of each data catalog using merit_hydro_1k
bbox = gpd.GeoDataFrame(geometry=[box(*dem.raster.bounds)], crs=4326)
bbox_extract = gpd.GeoDataFrame(geometry=[box(*dem_extract.raster.bounds)], crs=4326)
# Initialise plot
fig = plt.figure(figsize=(7, 5))
ax = fig.add_subplot(projection=proj)
# Plot the bounding box
bbox.boundary.plot(ax=ax, color="k", linewidth=0.8)
bbox_extract.boundary.plot(ax=ax, color="red", linewidth=0.8)
# Plot elevation
dem.raster.mask_nodata().plot(ax=ax, cmap="gray")
dem_extract.raster.mask_nodata().plot(ax=ax, cmap="terrain")
ax.set_title("exported and original DEMs")