Joseph Hamman (jhamman@ucar.edu) and Julia Kent (jkent@ucar.edu)
NCAR, Boulder, CO, USA
This notebook was developed for the 2020 EarthCube All Hands Meeting. The development of Scikit-downscale done in conjunction with the development of the Pangeo Project and was supported by the following awards:
ECAHM 2020 ID: 143
Climate data from Earth System Models (ESMs) are increasingly being used to study the impacts of climate change on a broad range of biogeophysical systems (forest fire, flood, fisheries, etc.) and human systems (water resources, power grids, etc.). Before this data can be used to study many of these systems, post-processing steps commonly referred to as bias correction and statistical downscaling must be performed. “Bias correction” is used to correct persistent biases in climate model output and “statistical downscaling” is used to increase the spatiotemporal resolution of the model output (i.e. from 1 deg to 1/16th deg grid boxes). For our purposes, we’ll refer to both parts as “downscaling”.
In the past few decades, the applications community has developed a plethora of downscaling methods. Many of these methods are ad-hoc collections of processing routines while others target very specific applications. The proliferation of downscaling methods has left the climate applications community with an overwhelming body of research to sort through without much in the form of synthesis guilding method selection or applicability.
Motivated by the pressing socio-environmental challenges of climate change – and with the learnings from previous downscaling efforts (e.g. Gutmann et al. 2014, Lanzante et al. 2018) in mind – we have begun working on a community-centered open framework for climate downscaling: scikit-downscale. We believe that the community will benefit from the presence of a well-designed open source downscaling toolbox with standard interfaces alongside a repository of benchmark data to test and evaluate new and existing downscaling methods.
In this notebook, we provide an overview of the scikit-downscale project, detailing how it can be used to downscale a range of surface climate variables such as surface air temperature and precipitation. We also highlight how scikit-downscale framework is being used to compare exisiting methods and how it can be extended to support the development of new downscaling methods.
Scikit-downscale is a new open source Python project. Within Scikit-downscale, we are been curating a collection of new and existing downscaling methods within a common framework. Key features of Scikit-downscale are:
fit
/ predict
pattern found in many machine learning packages (Scikit-learn, Tensorflow, etc.),Scikit-downscale's source code is available on GitHub.
We define pointwise methods as those that only use local information during the downscaling process. They can be often represented as a general linear model and fit independently across each point in the study domain. Examples of existing pointwise methods are:
Because pointwise methods can be written as a stand-alone linear model, Scikit-downscale implements these models as a Scikit-learn LinearModel or Pipeline. By building directly on Scikit-learn, we inherit a well defined model API and the ability to interoperate with a robust ecosystem utilities for model evaluation and optimization (e.g. grid-search). Perhaps more importantly, this structure also allows us to compare methods at a high-level of granularity (single spatial point) before deploying them on large domain problems.
*Begin interactive demo*
From here forward in this notebook, we'll jump back and forth between Python and text cells to describe how scikit-downscale works.
This first cell just imports some libraries and get's things setup for our analysis to come.
%load_ext autoreload
%autoreload 2
%matplotlib inline
import warnings
warnings.filterwarnings("ignore") # sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from utils import get_sample_data
sns.set(style='darkgrid')
Now that we've imported a few libraries, let's open a sample dataset from a single point in North America. We'll use this data to explore Scikit-downscale and its existing functionality. You'll notice there are two groups of data, training
and targets
. The training
data is meant to represent data from a typical climate model and the targets
data is meant to represent our "observations". For the purpose of this demonstration, we've choosen training data sampled from a regional climate model (WRF) run at 50km resolution over North America. The observations are sampled from the nearest 1/16th grid cell in Livneh et al, 2013.
We have choosen to use the tmax
variable (daily maximum surface air temperature) for demonstration purposes. With a small amount of effort, an interested reader could swap tmax
for pcp
and test these methods on precipitation.
# load sample data
training = get_sample_data('training')
targets = get_sample_data('targets')
# print a table of the training/targets data
display(pd.concat({'training': training, 'targets': targets}, axis=1))
# make a plot of the temperature and precipitation data
fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(8, 6), sharex=True)
time_slice = slice('1990-01-01', '1990-12-31')
# plot-temperature
training[time_slice]['tmax'].plot(ax=axes[0], label='training')
targets[time_slice]['tmax'].plot(ax=axes[0], label='targets')
axes[0].legend()
axes[0].set_ylabel('Temperature [C]')
# plot-precipitation
training[time_slice]['pcp'].plot(ax=axes[1])
targets[time_slice]['pcp'].plot(ax=axes[1])
_ = axes[1].set_ylabel('Precipitation [mm/day]')
training | targets | |||
---|---|---|---|---|
tmax | pcp | tmax | pcp | |
time | ||||
1950-01-01 | NaN | NaN | -0.22 | 5.608394 |
1950-01-02 | NaN | NaN | -4.54 | 2.919726 |
1950-01-03 | NaN | NaN | -7.87 | 3.066762 |
1950-01-04 | NaN | NaN | -5.08 | 4.684164 |
1950-01-05 | NaN | NaN | -0.79 | 4.295568 |
... | ... | ... | ... | ... |
2015-11-26 | 7.657013 | 0.000000e+00 | NaN | NaN |
2015-11-27 | 7.687256 | 0.000000e+00 | NaN | NaN |
2015-11-28 | 10.480835 | 0.000000e+00 | NaN | NaN |
2015-11-29 | 11.728516 | 0.000000e+00 | NaN | NaN |
2015-11-30 | 10.285431 | 3.152419e-13 | NaN | NaN |
24075 rows × 4 columns
-17.13091922005571, -12.862010221465077
-14.080779944289693, -12.248722316865425
-13.41225626740947, -11.073253833049407
-12.618384401114204, -10.664395229982965
-12.409470752089135, -10.051107325383306
-11.90807799442897, -9.54003407155026
-11.61559888579387, -8.9267461669506
-11.239554317548746, -8.9267461669506
-10.65459610027855, -9.028960817717206
-10.529247910863509, -8.313458262350942
-10.069637883008355, -8.364565587734248
-9.484679665738163, -8.313458262350942
-9.233983286908078, -8.364565587734248
-9.066852367688021, -7.9045996592845
-8.774373259052922, -7.9045996592845
-8.565459610027855, -7.751277683134582
-8.607242339832869, -7.291311754684841
-8.272980501392759, -7.240204429301535
-8.147632311977716, -6.729131175468488
-7.813370473537603, -6.7802385008517945
-7.646239554317546, -6.678023850085182
-7.68802228412256, -6.1158432708688295
-7.228412256267408, -6.064735945485523
-7.019498607242337, -5.604770017035776
-6.8105849582172695, -5.4003407155025585
-6.8105849582172695, -5.042589437819423
-6.476323119777156, -5.042589437819423
-6.559888579387184, -4.429301533219764
-6.2256267409470745, -4.48040885860307
-5.933147632311975, -4.48040885860307
-5.682451253481894, -4.429301533219764
-5.515320334261837, -4.48040885860307
-5.389972144846794, -3.9182282793867103
-4.972144846796656, -3.9182282793867103
-4.554317548746514, -3.9693356047700163
-4.052924791086351, -3.9693356047700163
-3.969359331476319, -3.3560477001703575
-3.635097493036209, -3.3560477001703575
-3.3008356545960957, -3.3560477001703575
-3.259052924791085, -2.844974446337311
-3.0919220055710284, -2.180579216354353
-2.799442896935929, -2.180579216354353
-2.5069637883008333, -2.0272572402044275
-2.423398328690805, -1.720613287904598
-2.0891364902506915, -1.720613287904598
-1.7548746518105816, -1.6695059625212991
-1.83844011142061, -1.2095400340715514
-1.7130919220055674, -1.1073253833049463
-1.5459610027855106, -0.7495741056218108
-1.295264623955429, -0.54514480408859
-1.2116991643454043, -0.0851788756388423
-0.919220055710305, -0.034071550255543315
-0.919220055710305, 0.528109028960813
-0.6267409470752057, 0.6814310051107313
-0.5849582172701879, 0.9369676320272546
-0.29247910863509574, 1.0902896081771658
-0.29247910863509574, 1.6524701873935221
-0.08356545961002482, 2.1124361158432663
0.20891364902507448, 2.1635434412265724
0.29247910863510285, 2.7257240204429287
0.5431754874651809, 2.7768313458262313
0.584958217270195, 3.339011925042584
0.793871866295266, 3.8500851788756343
1.0027855153203369, 4.3611584327086845
1.2534818941504184, 4.412265758091991
1.2952646239554362, 4.923339011925037
1.5459610027855177, 5.434412265758091
1.9220055710306418, 6.047700170357746
2.1309192200557128, 6.456558773424188
2.5069637883008404, 7.1209540034071495
2.9247910863509787, 7.734241908006812
3.2172701949860745, 8.296422487223161
3.593314763231202, 8.858603066439517
3.927576601671312, 9.318568994889262
4.220055710306411, 9.369676320272568
4.554317548746521, 9.369676320272568
4.721448467966582, 9.93185689948892
5.097493036211702, 9.982964224872227
5.181058495821734, 10.545144804088583
5.557103064066858, 10.596252129471885
5.8495821727019575, 11.005110732538324
6.350974930362121, 11.05621805792163
6.5598885793871915, 11.669505962521292
6.977715877437326, 11.7717206132879
7.228412256267411, 12.180579216354339
7.604456824512539, 12.231686541737641
7.771587743732596, 12.793867120953998
8.105849582172702, 12.793867120953998
8.314763231197773, 13.253833049403742
8.649025069637887, 13.8671209540034
8.774373259052929, 14.378194207836454
9.233983286908085, 14.429301533219757
9.526462395543177, 15.042589437819416
9.986072423398333, 14.99148211243611
10.069637883008362, 15.553662691652464
10.487465181058504, 15.655877342419075
10.52924791086351, 16.01362862010221
10.738161559888582, 16.678023850085175
10.905292479108645, 17.086882453151613
11.281337047353766, 17.291311754684834
11.448467966573823, 17.853492333901187
11.82451253481895, 17.904599659284493
12.116991643454043, 18.31345826235093
12.49303621169917, 18.31345826235093
12.660167130919227, 18.875638841567287
13.077994428969369, 18.92674616695059
13.412256267409475, 18.977853492333892
13.45403899721449, 19.386712095400334
14.080779944289695, 19.43781942078364
14.247910863509759, 19.948892674616687
14.665738161559894, 19.999999999999993
15.083565459610035, 20.61328790459965
15.208913649025071, 21.0732538330494
15.459610027855156, 21.68654173764906
15.752089136490255, 21.68654173764906
15.83565459610029, 22.146507666098803
17.506963788300844, 22.146507666098803
17.63231197771588, 22.75979557069846
18.300835654596106, 22.657580919931853
18.676880222841234, 22.708688245315155
18.676880222841234, 23.27086882453151
(5753,)
# exploratory data analysis for arrm model
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import QuantileTransformer
from sklearn.preprocessing import KBinsDiscretizer
from mlinsights.mlmodel import PiecewiseRegressor
def ARRM(n_bins=7):
return Pipeline([
('')
])
sns.set(style='whitegrid')
c = {'train': 'black', 'predict': 'blue', 'test': 'grey'}
qqwargs = {'n_quantiles': 1e6, 'copy': True, 'subsample': 1e6}
n_bins = 7
X = training[['tmax']]['1980': '2000'].values
y = targets[['tmax']]['1980': '2000'].values
X_train, X_test, y_train, y_test = train_test_split(X, y)
xqt = QuantileTransformer(**qqwargs).fit(X_train)
Xq_train = xqt.transform(X_train)
Xq_test = xqt.transform(X_test)
yqt = QuantileTransformer(**qqwargs).fit(y_train)
yq_train = xqt.transform(y_train)[:, 0]
yq_test = xqt.transform(y_test)[:, 0]
print(X.shape, y.shape, X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# model = PiecewiseRegressor(binner=KBinsDiscretizer(n_bins=n_bins, strategy='quantile'))
# model.fit(Xq_train, yq_train)
# predq = model.predict(Xq_test)
# pred = qt.inverse_transform(predq.reshape(-1, 1))
y_train = y_train[:, 0]
for strat in ['kmeans', 'uniform', 'quantile']:
model = PiecewiseRegressor(binner=KBinsDiscretizer(n_bins=n_bins, strategy=strat))
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(strat, model.score(X_test, y_test))
model = PiecewiseRegressor(binner=KBinsDiscretizer(n_bins=n_bins, strategy='kmeans'))
model.fit(X_train, y_train)
pred = model.predict(X_test)
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
plt.scatter(X_train, y_train, c=c['train'], s=5, label='train')
plt.scatter(X_test, y_test, c=c['test'], s=5, label='test')
ax.legend()
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
plt.scatter(np.sort(X_train, axis=0), np.sort(y_train, axis=0), c=c['train'], s=5, label='train')
plt.scatter(np.sort(X_test, axis=0), np.sort(y_test, axis=0), c=c['test'], s=5, label='test')
plt.plot(np.sort(X_test, axis=0), np.sort(pred, axis=0), c=c['predict'], lw=2, label='predictions')
ax.legend()
# fig, ax = plt.subplots(1, 1)
# ax.plot(Xq_test[:, 0], yq_test, ".", label='data', c=c['test'])
# ax.plot(Xq_test[:, 0], predq, ".", label="predictions", c=c['predict'])
# ax.set_title(f"Piecewise Linear Regression\n{n_bins} buckets")
# ax.legend()
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
ax.plot(X_test[:, 0], y_test, ".", label='data', c=c['test'])
ax.plot(X_test[:, 0], pred, ".", label="predictions", c=c['predict'])
ax.set_title(f"Piecewise Linear Regression\n{n_bins} buckets")
ax.legend()
(7671, 1) (7671, 1) (5753, 1) (1918, 1) (5753, 1) (1918, 1) kmeans 0.8997464212628336 uniform 0.8993860215386027 quantile 0.8993789796792236
<matplotlib.legend.Legend at 0x7fa17e445790>
As we mentioned above, Scikit-downscale utilizes a similiar API to that of Scikit-learn for its pointwise models. This means we can build collections of models that may be quite different internally, but operate the same at the API level. Importantly, this means that all downscaling methods have two common API methods: fit
, which trains the model given training and targets data, and predict
which uses the fit model to perform the downscaling opperation. This is perhaps the most important feature of Scikit-downscale, the ability to test and compare arbitrary combinations of models under a common interface. This allows us to try many combinations of models and parameters, choosing only the best combinations. The following pseudo-code block describe the workflow common to all scikit-downscale models:
from skdownscale.pointwise_models import MyModel
...
# load and pre-process input data (X and y)
...
model = MyModel(**parameters)
model.fit(X_train, y)
predictions = model.predict(X_predict)
...
# evaluate and/or save predictions
...
In the cell below, we'll create nine different downscaling models, some from Scikit-downscale and some from Scikit-learn.
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from skdownscale.pointwise_models import PureAnalog, AnalogRegression
from skdownscale.pointwise_models import BcsdTemperature, BcsdPrecipitation
models = {
'GARD: PureAnalog-best-1': PureAnalog(kind='best_analog', n_analogs=1),
'GARD: PureAnalog-sample-10': PureAnalog(kind='sample_analogs', n_analogs=10),
'GARD: PureAnalog-weight-10': PureAnalog(kind='weight_analogs', n_analogs=10),
'GARD: PureAnalog-weight-100': PureAnalog(kind='weight_analogs', n_analogs=100),
'GARD: PureAnalog-mean-10': PureAnalog(kind='mean_analogs', n_analogs=10),
'GARD: AnalogRegression-100': AnalogRegression(n_analogs=100),
'GARD: LinearRegression': LinearRegression(),
'BCSD: BcsdTemperature': BcsdTemperature(return_anoms=False),
'Sklearn: RandomForestRegressor': RandomForestRegressor(random_state=0)
}
train_slice = slice('1980-01-01', '1989-12-31')
predict_slice = slice('1990-01-01', '1999-12-31')
Now that we've created a collection of models, we want to train the models on the same input data. We do this by looping through our dictionary of models and calling the fit
method:
# extract training / prediction data
X_train = training[['tmax']][train_slice]
y_train = targets[['tmax']][train_slice]
X_predict = training[['tmax']][predict_slice]
# Fit all models
for key, model in models.items():
model.fit(X_train, y_train)
Just like that, we fit nine downscaling models. Now we want to use those models to downscale/bias-correct our data. For the sake of easy comparison, we'll use a different part of the training data:
# store predicted results in this dataframe
predict_df = pd.DataFrame(index = X_predict.index)
for key, model in models.items():
predict_df[key] = model.predict(X_predict)
# show a table of the predicted data
display(predict_df.head())
GARD: PureAnalog-best-1 | GARD: PureAnalog-sample-10 | GARD: PureAnalog-weight-10 | GARD: PureAnalog-weight-100 | GARD: PureAnalog-mean-10 | GARD: AnalogRegression-100 | GARD: LinearRegression | BCSD: BcsdTemperature | Sklearn: RandomForestRegressor | |
---|---|---|---|---|---|---|---|---|---|
time | |||||||||
1990-01-01 | 4.50 | 5.67 | 5.375299 | 5.697786 | 5.895 | 5.931445 | 5.781472 | 4.528703 | 5.2024 |
1990-01-02 | 6.13 | 3.55 | 3.543398 | 3.264698 | 2.561 | 2.515919 | 2.524322 | -1.584749 | 4.6935 |
1990-01-03 | 5.46 | 3.04 | 4.963575 | 4.933534 | 4.692 | 4.862730 | 4.944167 | 2.848937 | 4.6736 |
1990-01-04 | 8.57 | 5.90 | 8.369125 | 8.239455 | 7.340 | 7.255379 | 7.107427 | 6.687826 | 8.4134 |
1990-01-05 | 5.67 | 7.03 | 7.424970 | 7.583703 | 7.705 | 7.711861 | 7.878299 | 8.296425 | 6.5789 |
Now, let's do some sample analysis on our predicted data. First, we'll look at a timeseries of all the downscaled timeseries for the first year of the prediction period. In the figure below, the target
(truth) data is shown in black, the original (pre-correction) data is shown in grey, and each of the downscaled data timeseries is shown in a different color.
fig, ax = plt.subplots(figsize=(8, 3.5))
targets['tmax'][time_slice].plot(ax=ax, label='target', c='k', lw=1, alpha=0.75, legend=True, zorder=10)
X_predict['tmax'][time_slice].plot(label='original', c='grey', ax=ax, alpha=0.75, legend=True)
predict_df[time_slice].plot(ax=ax, lw=0.75)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
_ = ax.set_ylabel('Temperature [C]')
Of course, its difficult to tell which of the nine downscaling methods performed best from our plot above. We may want to evaluate our predictions using a standard statistical score, such as $r^2$. Those results are easily computed below:
# calculate r2
score = (predict_df.corrwith(targets.tmax[predict_slice]) **2).sort_values().to_frame('r2_score')
display(score)
r2_score | |
---|---|
GARD: PureAnalog-best-1 | 0.820281 |
GARD: PureAnalog-sample-10 | 0.820977 |
BCSD: BcsdTemperature | 0.858258 |
Sklearn: RandomForestRegressor | 0.864160 |
GARD: PureAnalog-weight-10 | 0.881287 |
GARD: PureAnalog-weight-100 | 0.892049 |
GARD: PureAnalog-mean-10 | 0.899297 |
GARD: AnalogRegression-100 | 0.906217 |
GARD: LinearRegression | 0.906316 |
All of our downscaling methods seem to be doing fairly well. The timeseries and statistics above shows that all our methods are producing generally resonable results. However, we are often interested in how our models do at predicting extreme events. We can quickly look into those aspects of our results using the qq
plots below. There you'll see that the models diverge in some interesting ways. For example, while the LinearRegression
method has the highest $r^2$ score, it seems to have trouble capturing extreme heat events. Whereas many of the analog methods, as well as the RandomForestRegressor
, perform much better on the tails of the distributions.
from utils import prob_plots
fig = prob_plots(X_predict, targets['tmax'], predict_df[score.index.values], shape=(3, 3), figsize=(12, 12))
In this section, we've shown how easy it is to fit, predict, and evaluate scikit-downscale models. The seamless interoperability of these models clearly facilitates a workflow that enables a deeper level of model evaluation that is otherwise possible in the downscaling world.
In the section above, we showed how it is possible to use scikit-downscale to bias-correct a timeseries of daily maximum air temperature using an arbitrary collection of linear models. Some of those models were general machine learning methods (e.g. LinearRegression
or RandomForestRegressor
) while others were tailor-made methods developed specifically for downscaling (e.g. BCSDTemperature
). In this section, we walk through how new pointwise methods can be added to the scikit-downscale framework, highlighting the Z-Score method along the way.
Z-Score bias correction is a good technique for target variables with Gaussian probability distributions, such as zonal wind speed.
In essence the technique:
and standard deviation $$\sigma = \sqrt{\frac{\sum_{i=0}^N |x_i - \overline{x}|^2}{N-1}}$$ of target (measured) data and training (historical modeled) data.
and scale parameter $$scale = \sigma_{target} \div \sigma_{training}$$
and new standard deviation $$\sigma_{corrected} = \sigma_{future} \times scale$$
from the future model's z-score values $$z_i = \frac{x_i-\overline{x}}{\sigma}$$
In practice, if the wind was on average 3 m/s faster on the first of July in the models compared to the measurements, we would adjust the modeled data for all July 1sts in the future modeled dataset to be 3 m/s faster. And similarly for scaling the standard deviation
Scikit-downscale's pointwise all implement Scikit-learn's fit
/predict
API. Each new downscaler must implement a minimum of three class methods: __init__
, fit
, predict
.
class AbstractDownscaler(object):
def __init__(self):
...
def fit(self, X, y):
...
return self
def predict(X):
...
return y_hat
Ommitting some of the complexity in the full implementation (which can be found in the full implementation on GitHub), we demonstrate how the ZScoreRegressor
was built:
First, we define our __init__
method, allowing users to specify specific options (in this case window_width
):
class ZScoreRegressor(object):
def __init__(self, window_width=31):
self.window_width = window_width
Next, we define our fit
method,
def fit(self, X, y):
X_mean, X_std = _calc_stats(X.squeeze(), self.window_width)
y_mean, y_std = _calc_stats(y.squeeze(), self.window_width)
self.stats_dict_ = {
"X_mean": X_mean,
"X_std": X_std,
"y_mean": y_mean,
"y_std": y_std,
}
shift, scale = _get_params(X_mean, X_std, y_mean, y_std)
self.shift_ = shift
self.scale_ = scale
return self
Finally, we define our predict
method,
def predict(self, X):
fut_mean, fut_std, fut_zscore = _get_fut_stats(X.squeeze(), self.window_width)
shift_expanded, scale_expanded = _expand_params(X.squeeze(), self.shift_, self.scale_)
fut_mean_corrected, fut_std_corrected = _correct_fut_stats(
fut_mean, fut_std, shift_expanded, scale_expanded
)
self.fut_stats_dict_ = {
"meani": fut_mean,
"stdi": fut_std,
"meanf": fut_mean_corrected,
"stdf": fut_std_corrected,
}
fut_corrected = (fut_zscore * fut_std_corrected) + fut_mean_corrected
return fut_corrected.to_frame(name)
from skdownscale.pointwise_models import ZScoreRegressor
# open a small dataset
training = get_sample_data('wind-hist')
target = get_sample_data('wind-obs')
future = get_sample_data('wind-rcp')
# bias correction using ZScoreRegresssor
zscore = ZScoreRegressor()
zscore.fit(training, target)
fit_stats = zscore.fit_stats_dict_
out = zscore.predict(future)
predict_stats = zscore.predict_stats_dict_
# visualize the datasets
from utils import zscore_ds_plot
zscore_ds_plot(training, target, future, out)
from utils import zscore_correction_plot
zscore_correction_plot(zscore)
In the examples above, we have performed downscaling on sample data sourced from individual points. In many downscaling workflows, however, users will want to apply pointwise methods at all points in their study domain. For this use case, scikit-downscale provides a high-level wrapper class: PointWiseDownscaler
.
In the example below, we'll use the BCSDTemperature
model, along with the PointWiseDownscaler
wrapper, to downscale daily maximum surface air temperature from CMIP6 for all point in a subset of the Pacific Norwest. We'll use a local Dask Cluster to distribute the computation among our available processors. Though not the point of this example, we also use intake-esm to access CMIP6 data stored on Google Cloud Storage.
Data:
# parameters
train_slice = slice('1980', '1982') # train time range
holdout_slice = slice('1990', '1991') # prediction time range
# bounding box of downscaling region
lon_slice = slice(-124.8, -120.0)
lat_slice = slice(50, 45)
# chunk shape for dask execution (time must be contiguous, ie -1)
chunks = {'lat': 10, 'lon': 10, 'time': -1}
Step 1: Start a Dask Cluster. Xarray and the PointWiseDownscaler
will make use of this cluster when it comes time to load input data and train/predict downscaling models.
from dask.distributed import Client
client = Client()
client
Client
|
Cluster
|
Step 2. Load our target data.
Here we use xarray directly to load a collection of OpenDAP endpoints.
import xarray as xr
fnames = [f'http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/tmmx/tmmx_{year}.nc'
for year in range(int(train_slice.start), int(train_slice.stop) + 1)]
# open the data and cleanup a bit of metadata
obs = xr.open_mfdataset(fnames, engine='pydap', concat_dim='day').rename({'day': 'time'}).drop('crs')
obs_subset = obs['air_temperature'].sel(time=train_slice, lon=lon_slice, lat=lat_slice).resample(time='1d').mean().load(scheduler='threads').chunk(chunks)
# display
display(obs_subset)
obs_subset.isel(time=0).plot()
<xarray.DataArray 'air_temperature' (time: 1096, lat: 106, lon: 115)> dask.array<xarray-<this-array>, shape=(1096, 106, 115), dtype=float32, chunksize=(1096, 10, 10), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1982-12-31 * lat (lat) float64 49.4 49.36 49.32 49.28 ... 45.15 45.11 45.07 45.03 * lon (lon) float64 -124.8 -124.7 -124.7 -124.6 ... -120.1 -120.1 -120.0
|
array(['1980-01-01T00:00:00.000000000', '1980-01-02T00:00:00.000000000', '1980-01-03T00:00:00.000000000', ..., '1982-12-29T00:00:00.000000000', '1982-12-30T00:00:00.000000000', '1982-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
array([49.4 , 49.358333, 49.316667, 49.275 , 49.233333, 49.191667, 49.15 , 49.108333, 49.066667, 49.025 , 48.983333, 48.941667, 48.9 , 48.858333, 48.816667, 48.775 , 48.733333, 48.691667, 48.65 , 48.608333, 48.566667, 48.525 , 48.483333, 48.441667, 48.4 , 48.358333, 48.316667, 48.275 , 48.233333, 48.191667, 48.15 , 48.108333, 48.066667, 48.025 , 47.983333, 47.941667, 47.9 , 47.858333, 47.816667, 47.775 , 47.733333, 47.691667, 47.65 , 47.608333, 47.566667, 47.525 , 47.483333, 47.441667, 47.4 , 47.358333, 47.316667, 47.275 , 47.233333, 47.191667, 47.15 , 47.108333, 47.066667, 47.025 , 46.983333, 46.941667, 46.9 , 46.858333, 46.816667, 46.775 , 46.733333, 46.691667, 46.65 , 46.608333, 46.566667, 46.525 , 46.483333, 46.441667, 46.4 , 46.358333, 46.316667, 46.275 , 46.233333, 46.191667, 46.15 , 46.108333, 46.066667, 46.025 , 45.983333, 45.941667, 45.9 , 45.858333, 45.816667, 45.775 , 45.733333, 45.691667, 45.65 , 45.608333, 45.566667, 45.525 , 45.483333, 45.441667, 45.4 , 45.358333, 45.316667, 45.275 , 45.233333, 45.191667, 45.15 , 45.108333, 45.066667, 45.025 ])
array([-124.766667, -124.725 , -124.683333, -124.641667, -124.6 , -124.558333, -124.516667, -124.475 , -124.433333, -124.391667, -124.35 , -124.308333, -124.266667, -124.225 , -124.183333, -124.141667, -124.1 , -124.058333, -124.016667, -123.975 , -123.933333, -123.891667, -123.85 , -123.808333, -123.766667, -123.725 , -123.683333, -123.641667, -123.6 , -123.558333, -123.516667, -123.475 , -123.433333, -123.391667, -123.35 , -123.308333, -123.266667, -123.225 , -123.183333, -123.141667, -123.1 , -123.058333, -123.016667, -122.975 , -122.933333, -122.891667, -122.85 , -122.808333, -122.766667, -122.725 , -122.683333, -122.641667, -122.6 , -122.558333, -122.516667, -122.475 , -122.433333, -122.391667, -122.35 , -122.308333, -122.266667, -122.225 , -122.183333, -122.141667, -122.1 , -122.058333, -122.016667, -121.975 , -121.933333, -121.891667, -121.85 , -121.808333, -121.766667, -121.725 , -121.683333, -121.641667, -121.6 , -121.558333, -121.516667, -121.475 , -121.433333, -121.391667, -121.35 , -121.308333, -121.266667, -121.225 , -121.183333, -121.141667, -121.1 , -121.058333, -121.016667, -120.975 , -120.933333, -120.891667, -120.85 , -120.808333, -120.766667, -120.725 , -120.683333, -120.641667, -120.6 , -120.558333, -120.516667, -120.475 , -120.433333, -120.391667, -120.35 , -120.308333, -120.266667, -120.225 , -120.183333, -120.141667, -120.1 , -120.058333, -120.016667])
<matplotlib.collections.QuadMesh at 0x7fbffec07610>
Step 3: Load our training/prediction data.
Here we use intake-esm
to access a single Xarray dataset from the Pangeo's Google Cloud CMIP6 data catalog.
import intake_esm
intake_esm.__version__
'2020.6.11'
import intake
# search the cmip6 catalog
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
cat = col.search(experiment_id=['historical', 'ssp585'], table_id='day', variable_id='tasmax',
grid_label='gn')
# access the data and do some cleanup
ds_model = cat['CMIP.NASA-GISS.GISS-E2-1-G.historical.day.gn'].to_dask().squeeze(drop=True).drop(['height', 'lat_bnds', 'lon_bnds', 'time_bnds'])
ds_model.lon.values[ds_model.lon.values > 180] -= 360
ds_model = ds_model.roll(lon=72, roll_coords=True)
# regional subsets, ready for downscaling
train_subset = ds_model['tasmax'].sel(time=train_slice).interp_like(obs_subset.isel(time=0, drop=True), method='linear')
train_subset['time'] = train_subset.indexes['time'].to_datetimeindex()
train_subset = train_subset.resample(time='1d').mean().load(scheduler='threads').chunk(chunks)
holdout_subset = ds_model['tasmax'].sel(time=holdout_slice).interp_like(obs_subset.isel(time=0, drop=True), method='linear')
holdout_subset['time'] = holdout_subset.indexes['time'].to_datetimeindex()
holdout_subset = holdout_subset.resample(time='1d').mean().load(scheduler='threads').chunk(chunks)
# display
display(train_subset)
train_subset.isel(time=0).plot()
<xarray.DataArray 'tasmax' (time: 1096, lat: 106, lon: 115)> dask.array<xarray-<this-array>, shape=(1096, 106, 115), dtype=float32, chunksize=(1096, 10, 10), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 1980-01-01 1980-01-02 ... 1982-12-31 * lat (lat) float64 49.4 49.36 49.32 49.28 ... 45.15 45.11 45.07 45.03 * lon (lon) float64 -124.8 -124.7 -124.7 -124.6 ... -120.1 -120.1 -120.0
|
array(['1980-01-01T00:00:00.000000000', '1980-01-02T00:00:00.000000000', '1980-01-03T00:00:00.000000000', ..., '1982-12-29T00:00:00.000000000', '1982-12-30T00:00:00.000000000', '1982-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
array([49.4 , 49.358333, 49.316667, 49.275 , 49.233333, 49.191667, 49.15 , 49.108333, 49.066667, 49.025 , 48.983333, 48.941667, 48.9 , 48.858333, 48.816667, 48.775 , 48.733333, 48.691667, 48.65 , 48.608333, 48.566667, 48.525 , 48.483333, 48.441667, 48.4 , 48.358333, 48.316667, 48.275 , 48.233333, 48.191667, 48.15 , 48.108333, 48.066667, 48.025 , 47.983333, 47.941667, 47.9 , 47.858333, 47.816667, 47.775 , 47.733333, 47.691667, 47.65 , 47.608333, 47.566667, 47.525 , 47.483333, 47.441667, 47.4 , 47.358333, 47.316667, 47.275 , 47.233333, 47.191667, 47.15 , 47.108333, 47.066667, 47.025 , 46.983333, 46.941667, 46.9 , 46.858333, 46.816667, 46.775 , 46.733333, 46.691667, 46.65 , 46.608333, 46.566667, 46.525 , 46.483333, 46.441667, 46.4 , 46.358333, 46.316667, 46.275 , 46.233333, 46.191667, 46.15 , 46.108333, 46.066667, 46.025 , 45.983333, 45.941667, 45.9 , 45.858333, 45.816667, 45.775 , 45.733333, 45.691667, 45.65 , 45.608333, 45.566667, 45.525 , 45.483333, 45.441667, 45.4 , 45.358333, 45.316667, 45.275 , 45.233333, 45.191667, 45.15 , 45.108333, 45.066667, 45.025 ])
array([-124.766667, -124.725 , -124.683333, -124.641667, -124.6 , -124.558333, -124.516667, -124.475 , -124.433333, -124.391667, -124.35 , -124.308333, -124.266667, -124.225 , -124.183333, -124.141667, -124.1 , -124.058333, -124.016667, -123.975 , -123.933333, -123.891667, -123.85 , -123.808333, -123.766667, -123.725 , -123.683333, -123.641667, -123.6 , -123.558333, -123.516667, -123.475 , -123.433333, -123.391667, -123.35 , -123.308333, -123.266667, -123.225 , -123.183333, -123.141667, -123.1 , -123.058333, -123.016667, -122.975 , -122.933333, -122.891667, -122.85 , -122.808333, -122.766667, -122.725 , -122.683333, -122.641667, -122.6 , -122.558333, -122.516667, -122.475 , -122.433333, -122.391667, -122.35 , -122.308333, -122.266667, -122.225 , -122.183333, -122.141667, -122.1 , -122.058333, -122.016667, -121.975 , -121.933333, -121.891667, -121.85 , -121.808333, -121.766667, -121.725 , -121.683333, -121.641667, -121.6 , -121.558333, -121.516667, -121.475 , -121.433333, -121.391667, -121.35 , -121.308333, -121.266667, -121.225 , -121.183333, -121.141667, -121.1 , -121.058333, -121.016667, -120.975 , -120.933333, -120.891667, -120.85 , -120.808333, -120.766667, -120.725 , -120.683333, -120.641667, -120.6 , -120.558333, -120.516667, -120.475 , -120.433333, -120.391667, -120.35 , -120.308333, -120.266667, -120.225 , -120.183333, -120.141667, -120.1 , -120.058333, -120.016667])
<matplotlib.collections.QuadMesh at 0x7fbffe6eff70>
Step 4. Now that we have loaded our training and target data, we can move on to fit our BcsdTemperature model at each x/y point in our domain. This is where the PointWiseDownscaler
comes in:
from skdownscale.pointwise_models import PointWiseDownscaler
from dask.diagnostics import ProgressBar
model = PointWiseDownscaler(BcsdTemperature(return_anoms=False))
model
<skdownscale.PointWiseDownscaler> Fit Status: False Model: BcsdTemperature(return_anoms=False)
Step 5. We fit the PointWiseDownscaler
, passing it data in Xarray data structures (our regional subsets from above). This opperation is lazy and return immediately. Under the hood, we can see that PointWiseDownscaler._models
is an Xarray.DataArray
of BcsdTemperature
models.
model.fit(train_subset, obs_subset)
display(model, model._models)
<skdownscale.PointWiseDownscaler> Fit Status: True Model: BcsdTemperature(return_anoms=False)
<xarray.DataArray 'tasmax' (lat: 106, lon: 115)> dask.array<_fit_wrapper-723171662d2ffd5ed708bc9f7d1b3a95-<this, shape=(106, 115), dtype=object, chunksize=(10, 10), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 49.4 49.36 49.32 49.28 ... 45.15 45.11 45.07 45.03 * lon (lon) float64 -124.8 -124.7 -124.7 -124.6 ... -120.1 -120.1 -120.0
|
array([49.4 , 49.358333, 49.316667, 49.275 , 49.233333, 49.191667, 49.15 , 49.108333, 49.066667, 49.025 , 48.983333, 48.941667, 48.9 , 48.858333, 48.816667, 48.775 , 48.733333, 48.691667, 48.65 , 48.608333, 48.566667, 48.525 , 48.483333, 48.441667, 48.4 , 48.358333, 48.316667, 48.275 , 48.233333, 48.191667, 48.15 , 48.108333, 48.066667, 48.025 , 47.983333, 47.941667, 47.9 , 47.858333, 47.816667, 47.775 , 47.733333, 47.691667, 47.65 , 47.608333, 47.566667, 47.525 , 47.483333, 47.441667, 47.4 , 47.358333, 47.316667, 47.275 , 47.233333, 47.191667, 47.15 , 47.108333, 47.066667, 47.025 , 46.983333, 46.941667, 46.9 , 46.858333, 46.816667, 46.775 , 46.733333, 46.691667, 46.65 , 46.608333, 46.566667, 46.525 , 46.483333, 46.441667, 46.4 , 46.358333, 46.316667, 46.275 , 46.233333, 46.191667, 46.15 , 46.108333, 46.066667, 46.025 , 45.983333, 45.941667, 45.9 , 45.858333, 45.816667, 45.775 , 45.733333, 45.691667, 45.65 , 45.608333, 45.566667, 45.525 , 45.483333, 45.441667, 45.4 , 45.358333, 45.316667, 45.275 , 45.233333, 45.191667, 45.15 , 45.108333, 45.066667, 45.025 ])
array([-124.766667, -124.725 , -124.683333, -124.641667, -124.6 , -124.558333, -124.516667, -124.475 , -124.433333, -124.391667, -124.35 , -124.308333, -124.266667, -124.225 , -124.183333, -124.141667, -124.1 , -124.058333, -124.016667, -123.975 , -123.933333, -123.891667, -123.85 , -123.808333, -123.766667, -123.725 , -123.683333, -123.641667, -123.6 , -123.558333, -123.516667, -123.475 , -123.433333, -123.391667, -123.35 , -123.308333, -123.266667, -123.225 , -123.183333, -123.141667, -123.1 , -123.058333, -123.016667, -122.975 , -122.933333, -122.891667, -122.85 , -122.808333, -122.766667, -122.725 , -122.683333, -122.641667, -122.6 , -122.558333, -122.516667, -122.475 , -122.433333, -122.391667, -122.35 , -122.308333, -122.266667, -122.225 , -122.183333, -122.141667, -122.1 , -122.058333, -122.016667, -121.975 , -121.933333, -121.891667, -121.85 , -121.808333, -121.766667, -121.725 , -121.683333, -121.641667, -121.6 , -121.558333, -121.516667, -121.475 , -121.433333, -121.391667, -121.35 , -121.308333, -121.266667, -121.225 , -121.183333, -121.141667, -121.1 , -121.058333, -121.016667, -120.975 , -120.933333, -120.891667, -120.85 , -120.808333, -120.766667, -120.725 , -120.683333, -120.641667, -120.6 , -120.558333, -120.516667, -120.475 , -120.433333, -120.391667, -120.35 , -120.308333, -120.266667, -120.225 , -120.183333, -120.141667, -120.1 , -120.058333, -120.016667])
Step 6. Finally, we can use our model to complete the downscaling workflow using the predict
method along with our holdout_subset
of CMIP6 data. We call the .load()
method to eagerly compute the data. We end by plotting a map of downscaled data over our study area.
predicted = model.predict(holdout_subset).load()
display(predicted)
predicted.isel(time=0).plot()
<xarray.DataArray (time: 730, lat: 106, lon: 115)> array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 272.9068 , 272.03818, 271.74667], [ nan, nan, nan, ..., 272.58118, 271.9899 , 271.59863], [ nan, nan, nan, ..., 272.38995, 271.89896, 271.48502]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 272.2721 , 271.3799 , 271.10745], [ nan, nan, nan, ..., 271.8796 , 271.28778, 270.87622], [ nan, nan, nan, ..., 271.8478 , 271.35623, 270.78433]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 270.06155, 269.17004, 268.7786 ], [ nan, nan, nan, ..., 269.6548 , 269.1635 , 268.67227], [ nan, nan, nan, ..., 269.63492, 269.057 , 268.65283]], ..., [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 277.1879 , 276.33917, 276.03476], [ nan, nan, nan, ..., 276.86758, 276.36328, 275.859 ], [ nan, nan, nan, ..., 276.79156, 276.2874 , 275.82758]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 277.70404, 277.39084, 276.97433], [ nan, nan, nan, ..., 277.34024, 276.8421 , 276.762 ], [ nan, nan, nan, ..., 277.26825, 276.48682, 276.36444]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 281.16882, 280.37064, 280.07254], [ nan, nan, nan, ..., 280.88696, 280.38916, 279.8914 ], [ nan, nan, nan, ..., 280.80515, 280.2077 , 279.81027]]], dtype=float32) Coordinates: * lat (lat) float64 49.4 49.36 49.32 49.28 ... 45.15 45.11 45.07 45.03 * time (time) datetime64[ns] 1990-01-01 1990-01-02 ... 1991-12-31 * lon (lon) float64 -124.8 -124.7 -124.7 -124.6 ... -120.1 -120.1 -120.0
array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 272.9068 , 272.03818, 271.74667], [ nan, nan, nan, ..., 272.58118, 271.9899 , 271.59863], [ nan, nan, nan, ..., 272.38995, 271.89896, 271.48502]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 272.2721 , 271.3799 , 271.10745], [ nan, nan, nan, ..., 271.8796 , 271.28778, 270.87622], [ nan, nan, nan, ..., 271.8478 , 271.35623, 270.78433]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 270.06155, 269.17004, 268.7786 ], [ nan, nan, nan, ..., 269.6548 , 269.1635 , 268.67227], [ nan, nan, nan, ..., 269.63492, 269.057 , 268.65283]], ..., [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 277.1879 , 276.33917, 276.03476], [ nan, nan, nan, ..., 276.86758, 276.36328, 275.859 ], [ nan, nan, nan, ..., 276.79156, 276.2874 , 275.82758]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 277.70404, 277.39084, 276.97433], [ nan, nan, nan, ..., 277.34024, 276.8421 , 276.762 ], [ nan, nan, nan, ..., 277.26825, 276.48682, 276.36444]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., 281.16882, 280.37064, 280.07254], [ nan, nan, nan, ..., 280.88696, 280.38916, 279.8914 ], [ nan, nan, nan, ..., 280.80515, 280.2077 , 279.81027]]], dtype=float32)
array([49.4 , 49.358333, 49.316667, 49.275 , 49.233333, 49.191667, 49.15 , 49.108333, 49.066667, 49.025 , 48.983333, 48.941667, 48.9 , 48.858333, 48.816667, 48.775 , 48.733333, 48.691667, 48.65 , 48.608333, 48.566667, 48.525 , 48.483333, 48.441667, 48.4 , 48.358333, 48.316667, 48.275 , 48.233333, 48.191667, 48.15 , 48.108333, 48.066667, 48.025 , 47.983333, 47.941667, 47.9 , 47.858333, 47.816667, 47.775 , 47.733333, 47.691667, 47.65 , 47.608333, 47.566667, 47.525 , 47.483333, 47.441667, 47.4 , 47.358333, 47.316667, 47.275 , 47.233333, 47.191667, 47.15 , 47.108333, 47.066667, 47.025 , 46.983333, 46.941667, 46.9 , 46.858333, 46.816667, 46.775 , 46.733333, 46.691667, 46.65 , 46.608333, 46.566667, 46.525 , 46.483333, 46.441667, 46.4 , 46.358333, 46.316667, 46.275 , 46.233333, 46.191667, 46.15 , 46.108333, 46.066667, 46.025 , 45.983333, 45.941667, 45.9 , 45.858333, 45.816667, 45.775 , 45.733333, 45.691667, 45.65 , 45.608333, 45.566667, 45.525 , 45.483333, 45.441667, 45.4 , 45.358333, 45.316667, 45.275 , 45.233333, 45.191667, 45.15 , 45.108333, 45.066667, 45.025 ])
array(['1990-01-01T00:00:00.000000000', '1990-01-02T00:00:00.000000000', '1990-01-03T00:00:00.000000000', ..., '1991-12-29T00:00:00.000000000', '1991-12-30T00:00:00.000000000', '1991-12-31T00:00:00.000000000'], dtype='datetime64[ns]')
array([-124.766667, -124.725 , -124.683333, -124.641667, -124.6 , -124.558333, -124.516667, -124.475 , -124.433333, -124.391667, -124.35 , -124.308333, -124.266667, -124.225 , -124.183333, -124.141667, -124.1 , -124.058333, -124.016667, -123.975 , -123.933333, -123.891667, -123.85 , -123.808333, -123.766667, -123.725 , -123.683333, -123.641667, -123.6 , -123.558333, -123.516667, -123.475 , -123.433333, -123.391667, -123.35 , -123.308333, -123.266667, -123.225 , -123.183333, -123.141667, -123.1 , -123.058333, -123.016667, -122.975 , -122.933333, -122.891667, -122.85 , -122.808333, -122.766667, -122.725 , -122.683333, -122.641667, -122.6 , -122.558333, -122.516667, -122.475 , -122.433333, -122.391667, -122.35 , -122.308333, -122.266667, -122.225 , -122.183333, -122.141667, -122.1 , -122.058333, -122.016667, -121.975 , -121.933333, -121.891667, -121.85 , -121.808333, -121.766667, -121.725 , -121.683333, -121.641667, -121.6 , -121.558333, -121.516667, -121.475 , -121.433333, -121.391667, -121.35 , -121.308333, -121.266667, -121.225 , -121.183333, -121.141667, -121.1 , -121.058333, -121.016667, -120.975 , -120.933333, -120.891667, -120.85 , -120.808333, -120.766667, -120.725 , -120.683333, -120.641667, -120.6 , -120.558333, -120.516667, -120.475 , -120.433333, -120.391667, -120.35 , -120.308333, -120.266667, -120.225 , -120.183333, -120.141667, -120.1 , -120.058333, -120.016667])
<matplotlib.collections.QuadMesh at 0x7fbffcad1be0>
Spatial models is a second class of downscaling methods that use information from the full study domain to form relationships between observations and ESM data. Scikit-downscale implements these models as as SpatialDownscaler. Beyond providing fit and predict methods that accept Xarray objects, the internal layout of these methods is intentionally unspecified. We are currently working on wrapping a few popular spatial downscaling models such as:
Its likely that one of the reasons we haven’t seen strong consensus develop around particularl downscaling methodologies is the abscense of widely available benchamrk applications to test methods against eachother. While Scikit-downscale will not solve this problem on its own, we hope the ability to implemnt downscaling applications within a common framework will enable a more robust benchmarking inititive that previously possible.
The Scikit-downscale effort is just getting started. With the recent release of CMIP6, we expect a surge of interest in downscaled climate data. There are clear opportunities for involvement from climate impacts practicioneers, computer scientists with an interest in machine learning for climate applications, and climate scientists alike. Please reach out if you are interested in participating in any way.
This notebook is licensed under CC-BY. Scikit-downscale is licensed under Apache License 2.0.