Hi all,
I'm looking for guidance / best practices on writing an xarray object to (a collection of) COGs. Let's start with a common case of a DataArray that's indexed by (time, band, y, x)
. Let's also assume that it's a chunked DataArray, with a chunksize of 1 for time
and band
, and it might be chunked along y
and x
as well.
My high-level questions:
.rio.to_raster(path, driver="COG")
have the right defaults? Anything special we should do to make sure we write "good" COGs for a single chunk?I'm particularly interested in item 2, since I'd like to easily generate STAC Items for this collection of COGs. With a well-structured directory convention, we can easily group multiple STAC Assets (one per COG) that cover the same area in space and time into a STAC item. My proposed naming convention is
<prefix>/time=<time>/band=<band>-y=<y-coord>-x=<x-coord>.tif
This works well for xarray: we have coordinate information available when writing the chunk, so we can safely generate a unique name for a chunk using the (time, band, y, x)
coordinates of, say, the top-left value in the chunk.
To make things concrete, here's a small example. I've dumped some code in https://github.com/TomAugspurger/xcog to make this a bit easier to read.
!pip install -q -U --no-deps git+https://github.com/TomAugspurger/xcog
We'll mock up some data that has the right structure for pystac / rioxarray to do their thing.
import xarray as xr
import numpy as np
import dask.array as da
import stackstac
import rioxarray
import pystac
import pandas as pd
values = da.random.uniform(size=(2, 3, 10980, 10980), chunks=(1, 1, 5490, 5490))
x = np.arange(399960, 509751, step=10.)
y = np.arange(4800000, 4690210 - 1, step=-10.)
band = np.array(["B02", "B03", "B04"])
time = pd.to_datetime(["2021-01-01T17:07:19.024000000", "2021-01-04T17:17:19.024000000"])
data = xr.DataArray(
values,
dims=("time", "band", "y", "x"),
coords={
"time": xr.DataArray(time, name="time", dims="time"),
"band": xr.DataArray(band, name="band", dims="band"),
"y": xr.DataArray(y, name="y", dims="y"),
"x": xr.DataArray(x, name="x", dims="x"),
"common_name": xr.DataArray(['blue', 'green', 'red'], dims="band", name="common_name"),
"center_wavelength":xr.DataArray([0.49 , 0.56 , 0.665], dims="band", name="center_wavelength"),
"full_width_half_max": xr.DataArray([0.098, 0.045, 0.038], dims="band", name="full_width_half_max"),
},
attrs={
"crs": "epsg:32615",
},
)
data
<xarray.DataArray 'uniform-872193f5d2193eb685484694f7ebc695' (time: 2, band: 3, y: 10980, x: 10980)> dask.array<uniform, shape=(2, 3, 10980, 10980), dtype=float64, chunksize=(1, 1, 5490, 5490), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2021-01-01T17:07:19.024000 202... * band (band) <U3 'B02' 'B03' 'B04' * y (y) float64 4.8e+06 4.8e+06 ... 4.69e+06 4.69e+06 * x (x) float64 4e+05 4e+05 4e+05 ... 5.097e+05 5.098e+05 common_name (band) <U5 'blue' 'green' 'red' center_wavelength (band) float64 0.49 0.56 0.665 full_width_half_max (band) float64 0.098 0.045 0.038 Attributes: crs: epsg:32615
|
array(['2021-01-01T17:07:19.024000000', '2021-01-04T17:17:19.024000000'], dtype='datetime64[ns]')
array(['B02', 'B03', 'B04'], dtype='<U3')
array([4800000., 4799990., 4799980., ..., 4690230., 4690220., 4690210.])
array([399960., 399970., 399980., ..., 509730., 509740., 509750.])
array(['blue', 'green', 'red'], dtype='<U5')
array([0.49 , 0.56 , 0.665])
array([0.098, 0.045, 0.038])
We're using xcog
here, a simple little library with some utilities for writing out chunks. We'll write to local disk, but we should be able to use any fsspec-compatible file-system.
from pathlib import Path
import xcog
dst = Path("/tmp/cogs/")
dst.mkdir(parents=True, exist_ok=True)
items = xcog.to_cog_and_stac(data, prefix=str(dst), storage_options=dict(auto_mkdir=True))
ERROR 4: `/vsimem/fc548ba3-e3e8-44b2-aeae-5360710ecb04/fc548ba3-e3e8-44b2-aeae-5360710ecb04.tif' not recognized as a supported file format. ERROR 4: `/vsimem/0819c8a4-98ba-4a31-a0b6-eb725a963e75/0819c8a4-98ba-4a31-a0b6-eb725a963e75.tif' not recognized as a supported file format. ERROR 4: `/vsimem/3109c1b8-771b-4860-8399-0e8c8b90ebd8/3109c1b8-771b-4860-8399-0e8c8b90ebd8.tif' not recognized as a supported file format. ERROR 4: `/vsimem/a079620a-7a01-4caf-8c34-31ec62fadb0d/a079620a-7a01-4caf-8c34-31ec62fadb0d.tif' not recognized as a supported file format. ERROR 4: `/vsimem/21421136-c716-441d-8942-0a439b37596e/21421136-c716-441d-8942-0a439b37596e.tif' not recognized as a supported file format. ERROR 4: `/vsimem/827a2d04-4d7d-4c14-9c5a-f4b6c6f9e859/827a2d04-4d7d-4c14-9c5a-f4b6c6f9e859.tif' not recognized as a supported file format. ERROR 4: `/vsimem/0f2a19c9-ac69-4527-ac95-e381887faefd/0f2a19c9-ac69-4527-ac95-e381887faefd.tif' not recognized as a supported file format. ERROR 4: `/vsimem/e36c69b0-2f07-4d7b-a715-89c8242e8e64/e36c69b0-2f07-4d7b-a715-89c8242e8e64.tif' not recognized as a supported file format. ERROR 4: `/vsimem/86166231-faa0-4c15-9d74-8c76da4ad1aa/86166231-faa0-4c15-9d74-8c76da4ad1aa.tif' not recognized as a supported file format. ERROR 4: `/vsimem/3b7c7a76-f683-4478-b916-11b8ba96220d/3b7c7a76-f683-4478-b916-11b8ba96220d.tif' not recognized as a supported file format. ERROR 4: `/vsimem/64c99d93-b882-4c35-a0b1-e3b32df59428/64c99d93-b882-4c35-a0b1-e3b32df59428.tif' not recognized as a supported file format. ERROR 4: `/vsimem/b21d29cd-5134-43e8-af92-8523fd6d91d6/b21d29cd-5134-43e8-af92-8523fd6d91d6.tif' not recognized as a supported file format. ERROR 4: `/vsimem/737da978-6649-4150-91f4-c1de38537165/737da978-6649-4150-91f4-c1de38537165.tif' not recognized as a supported file format. ERROR 4: `/vsimem/af35e916-fa26-454f-bc3c-05029d274d63/af35e916-fa26-454f-bc3c-05029d274d63.tif' not recognized as a supported file format. ERROR 4: `/vsimem/8032fb15-9cf2-4fc7-b4d6-209e9ce973c5/8032fb15-9cf2-4fc7-b4d6-209e9ce973c5.tif' not recognized as a supported file format. ERROR 4: `/vsimem/2ba7f5fd-21e4-4e1b-b13c-e709b2ee93c1/2ba7f5fd-21e4-4e1b-b13c-e709b2ee93c1.tif' not recognized as a supported file format. ERROR 4: `/vsimem/df983957-d325-4896-9cfd-122553d5eb71/df983957-d325-4896-9cfd-122553d5eb71.tif' not recognized as a supported file format. ERROR 4: `/vsimem/9248e932-a01c-4756-be3f-6edeb6b35607/9248e932-a01c-4756-be3f-6edeb6b35607.tif' not recognized as a supported file format. ERROR 4: `/vsimem/83e1621a-0c4f-42c8-8e4f-a24b87d0b85d/83e1621a-0c4f-42c8-8e4f-a24b87d0b85d.tif' not recognized as a supported file format. ERROR 4: `/vsimem/9dd138d0-7709-4499-bbb2-c2f28579f9dd/9dd138d0-7709-4499-bbb2-c2f28579f9dd.tif' not recognized as a supported file format. ERROR 4: `/vsimem/fecafa99-14c8-4cf4-9ba7-68ebc5e3ccfb/fecafa99-14c8-4cf4-9ba7-68ebc5e3ccfb.tif' not recognized as a supported file format. ERROR 4: `/vsimem/5aee567d-e3d0-4fd3-9e80-9d56a4269fb9/5aee567d-e3d0-4fd3-9e80-9d56a4269fb9.tif' not recognized as a supported file format. ERROR 4: `/vsimem/5d5d5910-46c2-4368-9a84-a8de12e74104/5d5d5910-46c2-4368-9a84-a8de12e74104.tif' not recognized as a supported file format. ERROR 4: `/vsimem/648504f2-4f1b-4627-9d34-aabe4bc603d5/648504f2-4f1b-4627-9d34-aabe4bc603d5.tif' not recognized as a supported file format.
Here's the COGs we wrote out, including their directory structure:
!tree /tmp/cogs/
/tmp/cogs/ ├── time=2021-01-01T17:07:19.024000 │ ├── band=B02-y=4745100.0-x=399960.0.tif │ ├── band=B02-y=4745100.0-x=454860.0.tif │ ├── band=B02-y=4800000.0-x=399960.0.tif │ ├── band=B02-y=4800000.0-x=454860.0.tif │ ├── band=B03-y=4745100.0-x=399960.0.tif │ ├── band=B03-y=4745100.0-x=454860.0.tif │ ├── band=B03-y=4800000.0-x=399960.0.tif │ ├── band=B03-y=4800000.0-x=454860.0.tif │ ├── band=B04-y=4745100.0-x=399960.0.tif │ ├── band=B04-y=4745100.0-x=454860.0.tif │ ├── band=B04-y=4800000.0-x=399960.0.tif │ └── band=B04-y=4800000.0-x=454860.0.tif └── time=2021-01-04T17:17:19.024000 ├── band=B02-y=4745100.0-x=399960.0.tif ├── band=B02-y=4745100.0-x=454860.0.tif ├── band=B02-y=4800000.0-x=399960.0.tif ├── band=B02-y=4800000.0-x=454860.0.tif ├── band=B03-y=4745100.0-x=399960.0.tif ├── band=B03-y=4745100.0-x=454860.0.tif ├── band=B03-y=4800000.0-x=399960.0.tif ├── band=B03-y=4800000.0-x=454860.0.tif ├── band=B04-y=4745100.0-x=399960.0.tif ├── band=B04-y=4745100.0-x=454860.0.tif ├── band=B04-y=4800000.0-x=399960.0.tif └── band=B04-y=4800000.0-x=454860.0.tif 2 directories, 24 files
to_cog_and_stac
returned a list of STAC items with one STAC item per (time, y, x)
chunk. Notably, the separate bands have been merged into a single STAC item.
items[0].assets
{'B02': <Asset href=/tmp/cogs/time=2021-01-01T17:07:19.024000/band=B02-y=4745100.0-x=399960.0.tif>, 'B03': <Asset href=/tmp/cogs/time=2021-01-01T17:07:19.024000/band=B03-y=4745100.0-x=399960.0.tif>, 'B04': <Asset href=/tmp/cogs/time=2021-01-01T17:07:19.024000/band=B04-y=4745100.0-x=399960.0.tif>}
And those can be fed back to stackstac, so kind of a round-trip from DataArray -> {STAC + COG} -> DataArray
stackstac.stack([x.to_dict() for x in items], chunksize=5490).groupby("time").apply(stackstac.mosaic)
<xarray.DataArray 'stackstac-84d23d14f1f12d1d04d3683dabc9a86f' (time: 2, band: 3, y: 10981, x: 10981)> dask.array<concatenate, shape=(2, 3, 10981, 10981), dtype=float64, chunksize=(1, 1, 5490, 5490), chunktype=numpy.ndarray> Coordinates: * band (band) <U3 'B02' 'B03' 'B04' * x (x) float64 4e+05 4e+05 4e+05 ... 5.097e+05 5.097e+05 5.098e+05 * y (y) float64 4.8e+06 4.8e+06 4.8e+06 ... 4.69e+06 4.69e+06 proj:epsg int64 32615 epsg int64 32615 * time (time) datetime64[ns] 2021-01-01T17:07:19.024000 2021-01-04T17... Attributes: spec: RasterSpec(epsg=32615, bounds=(399950.0, 4690200.0, 509760.0... crs: epsg:32615 transform: | 10.00, 0.00, 399950.00|\n| 0.00,-10.00, 4800010.00|\n| 0.0... resolution: 10.0
|
array(['B02', 'B03', 'B04'], dtype='<U3')
array([399950., 399960., 399970., ..., 509730., 509740., 509750.])
array([4800010., 4800000., 4799990., ..., 4690230., 4690220., 4690210.])
array(32615)
array(32615)
array(['2021-01-01T17:07:19.024000000', '2021-01-04T17:17:19.024000000'], dtype='datetime64[ns]')
(We'll ignore the fact that we're off by one on the size).