宜昌市气象局 longtsing 2022,songofsongs@vip.qq.com
MICAPS分布式数据环境(BDIPS)提供了两种访问方式:一种是直接连接读取Cassandra数据库,另外一种是基于Cassandra数据库上搭建的WEBService API方式检索读取。
nmc_met_io程序库的retrieve_cassandraDB模块, 基于Python Cassandra数据库接口包实现了Python语言对BDIPS数据的检索和读取。 (注:由于WEBService的使用泛滥,各省已经限制该接口只能下载三天内的数据,为此启用Cassandra读取方式能延长读取数据的时效;但各省在Cassandra数据库存储数据时长也有不同,一般存储一周以上的数据。)
[Cassandra]
ClusterIPAddresses=Cassandra集群IP地址以“,”分隔,可以参考MICAPS4的配置文件配置
ClusterPort=Cassandra集群服务端口
KeySpace=Cassandra上数据存储的主键名
# set up things
%matplotlib inline
%load_ext autoreload
%autoreload 2
D:\ProgramData\Anaconda3\envs\gpu1\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs: D:\ProgramData\Anaconda3\envs\gpu1\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll D:\ProgramData\Anaconda3\envs\gpu1\lib\site-packages\numpy\.libs\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll warnings.warn("loaded more than 1 DLL from .libs:"
# load necessary libraries
# you should install cartopy with 'conda install -c conda-forge cartopy'
import xarray as xr
import numpy as np
# load nmc_met_io for retrieving micaps server data
import sys
sys.path.append("../nmc_met_io")
from retrieve_cassandraDB import *
xr.set_options(display_style="text")
<xarray.core.options.set_options at 0x1b981f8d820>
模块retrieve_cassandraDB
提供读取数值模式网格预报数据的函数:
get_model_grid
: 读取单个时次标量, 矢量或集合成员的2D平面预报数据;get_model_grids
: 读取多个时次标量, 矢量或集合成员的2D平面预报数据;get_model_points
: 获取指定经纬度点的模式预报数据;get_model_3D_grid
: 获得单个时次标量, 矢量或集合成员的[lev, lat, lon]3D预报数据;get_model_3D_grids
: 获得多个时次标量, 矢量或集合成员的[lev, lat, lon]3D预报数据;get_model_profiles
: 获得制定经纬度单点的模块廓线预报数据.每个函数都有固定的参数directory
和filename
(或filenames
), 例如
# MICAPS分布式服务器上的数据地址, 可通过MICAPS4的数据源检索界面查找,
# 如下图, 找到数据存放的目录, 鼠标右键点击"保存路径到剪切板", 粘贴去掉"mdfs:///"
directory = 'ECMWF_HR/TMP/850'
# 指定具体的数据文件, 一般格式为"起报时间.预报时效", 若不指定, 则自动获得目录下最新数据的文件名
filename = '22022408.024'
# 调用函数读取数据
data = get_model_grid(directory, filename=filename)
directory = 'ECMWF_HR/TMP/850'
filename = '22022408.024'
data = get_model_grid(directory, filename=filename, cache=False)
if data is not None:
print(data)
else:
print("Retrieve failed.")
<xarray.Dataset> Dimensions: (time: 1, level: 1, lat: 241, lon: 361) Coordinates: * time (time) datetime64[ns] 2022-02-25T08:00:00 * level (level) float32 850.0 * lat (lat) float64 3.815e-06 0.25 0.5 ... 59.75 60.0 * lon (lon) float64 60.0 60.25 60.5 ... 149.5 149.8 150.0 forecast_reference_time datetime64[ns] 2022-02-24T08:00:00 forecast_period (time) float64 24.0 Data variables: data (time, level, lat, lon) float32 19.45 ... -14.65 Attributes: Conventions: CF-1.6 Origin: MICAPS Cassandra DB
# 使用?可以获得函数的帮助信息.
?get_model_grid
Signature: get_model_grid( directory, filename=None, suffix='*.024', varname='data', varattrs={'units': ''}, scale_off=None, levattrs={'long_name': 'pressure_level', 'units': 'hPa', '_CoordinateAxisType': 'Pressure'}, cache=True, cache_clear=True, ) Docstring: Retrieve numeric model grid forecast from MICAPS cassandra service. Support ensemble member forecast. :param directory: the data directory on the service :param filename: the data filename, if none, will be the latest file. :param suffix: the filename filter pattern which will be used to find the specified file. :param varname: set variable name. :param varattrs: set variable attributes, dictionary type. :param scale_off: [scale, offset], return values = values*scale + offset. :param levattrs: set level coordinate attributes, diectionary type. :param cache: cache retrieved data to local directory, default is True. :return: data, xarray type :Examples: >>> data = get_model_grid("ECMWF_HR/TMP/850") >>> data_ens = get_model_grid("ECMWF_ENSEMBLE/RAW/HGT/500", filename='18021708.024') >>> data_ens = get_model_grid('ECMWF_ENSEMBLE/RAW/TMP_2M', '19083008.024') File: h:\机器学习在空气污染预报应用的研究\nmc_met_io\nmc_met_io\retrieve_cassandradb.py Type: function
# 可以指定数据的变量名称, 变量属性等信息
data = get_model_grid(directory, filename=filename, varname='TEM', varattrs={'long_name':'temperature', 'units':'degree'}, cache=False)
if data is not None:
print(data.TEM)
else:
print("Retrieve failed.")
<xarray.DataArray 'TEM' (time: 1, level: 1, lat: 241, lon: 361)> array([[[[ 19.44553 , 19.461155, 19.41428 , ..., 19.773655, 19.992405, 19.992405], [ 19.00803 , 18.85178 , 18.836155, ..., 19.91428 , 20.03928 , 19.75803 ], [ 18.367405, 18.367405, 18.44553 , ..., 20.086155, 19.88303 , 19.711155], ..., [-11.413845, -11.74197 , -11.945095, ..., -14.80447 , -14.788845, -15.038845], [-11.49197 , -11.80447 , -12.02322 , ..., -13.86697 , -13.882595, -14.39822 ], [-11.507595, -11.71072 , -11.882595, ..., -12.71072 , -13.663845, -14.64822 ]]]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2022-02-25T08:00:00 * level (level) float32 850.0 * lat (lat) float64 3.815e-06 0.25 0.5 ... 59.75 60.0 * lon (lon) float64 60.0 60.25 60.5 ... 149.5 149.8 150.0 forecast_reference_time datetime64[ns] 2022-02-24T08:00:00 forecast_period (time) float64 24.0 Attributes: long_name: temperature units: degree
import matplotlib as mpl
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
# 绘制图像
fig = plt.figure(figsize=(12,8))
ax = plt.axes(projection=ccrs.LambertConformal(central_longitude=120))
data.TEM.plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'shrink': 0.8})
ax.coastlines()
ax.gridlines()
ax.set_extent([80,130,15,54], crs=ccrs.PlateCarree())
%time
directory = "ECMWF_HR/TMP/500"
fhours = np.arange(0, 132, 12)
filenames = ['22022208.'+'%03d'%(fhour) for fhour in fhours]
data = get_model_grids(directory, filenames, varname='HGT', varattrs={'long_name':'geopotential height', 'units':'dagpm'}, cache=False)
Wall time: 0 ns
data
<xarray.Dataset> Dimensions: (lat: 281, time: 11, level: 1, lon: 361) Coordinates: * lat (lat) float64 -10.0 -9.75 -9.5 ... 59.5 59.75 60.0 * time (time) datetime64[ns] 2022-02-22T08:00:00 ... 20... * level (level) float32 500.0 * lon (lon) float64 60.0 60.25 60.5 ... 149.5 149.8 150.0 forecast_reference_time datetime64[ns] 2022-02-22T08:00:00 forecast_period (time) float64 0.0 12.0 24.0 ... 96.0 108.0 120.0 Data variables: HGT (time, level, lat, lon) float32 nan nan ... -35.82 Attributes: Conventions: CF-1.6 Origin: MICAPS Cassandra DB
# 绘制图像
data.HGT.isel(level=0).plot.contour(col='time', col_wrap=4, levels=20)
<xarray.plot.facetgrid.FacetGrid at 0x1b9b518f490>
%time
directory = "ECMWF_ENSEMBLE/RAW/RAIN24"
filenames = get_file_list(directory)[:3]
data = get_model_grids(directory, filenames=filenames, varname='precipitation',
varattrs={'long_name':'accumulated precipitation', 'units':'mm'},
cache=False)
data
Wall time: 0 ns
<xarray.Dataset> Dimensions: (number: 51, time: 3, lat: 121, lon: 261) Coordinates: * number (number) int32 0 1 2 3 4 5 6 ... 45 46 47 48 49 50 * time (time) datetime64[ns] 2022-03-06T08:00:00 ... 20... * lat (lat) float64 0.0 0.5 1.0 1.5 ... 59.0 59.5 60.0 * lon (lon) float64 50.0 50.5 51.0 ... 179.0 179.5 180.0 forecast_reference_time datetime64[ns] 2022-02-23T08:00:00 forecast_period (time) float64 264.0 288.0 300.0 Data variables: precipitation (number, time, lat, lon) float64 0.0 0.0 ... 1.228 Attributes: Conventions: CF-1.6 Origin: MICAPS Cassandra DB
# set colors and levels
clevs = [0.01, 2.5, 5, 10, 20, 40, 60]
colors = ['#b1f7a3', '#76d870', '#39bb3e', '#318c2f', '#5cb8fc', '#0202fd', '#ef08e9']
cmap, norm = mpl.colors.from_levels_and_colors(clevs, colors, extend='max')
plt.rcParams['font.size'] = '20'
subdata = data.sel(lon=slice(105, 110), lat=slice(29, 34))
fg = subdata.precipitation.isel(time=0).plot(
col='number', col_wrap=8, cmap=cmap, norm=norm, extend="max", \
sharex=True, sharey=True,
cbar_kwargs={"aspect":40, "shrink":0.6, "pad":0.01})
fg.set_xlabels("Lon")
fg.set_ylabels("lat")
<xarray.plot.facetgrid.FacetGrid at 0x1b9b938ae50>
# 获得风云4A中国区域4通道产品
directory = "SATELLITE/FY4A/L1/CHINA/C008"
data = get_fy_awx(directory)
data
<xarray.Dataset> Dimensions: (time: 1, channel: 1, lat: 1001, lon: 1751) Coordinates: * time (time) datetime64[ns] 2022-02-24T07:15:00 * channel (channel) int16 5 * lat (lat) float64 15.0 15.04 15.08 15.12 ... 54.88 54.92 54.96 55.0 * lon (lon) float64 70.0 70.04 70.08 70.12 ... 139.9 139.9 140.0 140.0 Data variables: image (time, channel, lat, lon) float64 296.8 296.3 296.3 ... 258.7 260.6 Attributes: Conventions: CF-1.6 Origin: MICAPS Cassandra DB
# 绘制图像
fig = plt.figure(figsize=(12,8))
ax = plt.axes(projection=ccrs.LambertConformal(central_longitude=100))
data.image[0,0,:,:].plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'shrink': 0.8})
ax.coastlines()
ax.gridlines()
ax.set_extent([80,130,15,54], crs=ccrs.PlateCarree())
# 风云2静止卫星图像
directory = "SATELLITE/FY2/L1/IR1/EQUAL"
data = get_fy_awx(directory)
# 绘制图像
fig = plt.figure(figsize=(12,8))
ax = plt.axes(projection=ccrs.LambertConformal(central_longitude=100))
data.image[0,0,:,:].plot(ax=ax, transform=ccrs.PlateCarree(), cbar_kwargs={'shrink': 0.8}, cmap="gist_ncar", vmin=310, vmax=327)
ax.coastlines()
ax.gridlines()
ax.set_extent([80,130,15,54], crs=ccrs.PlateCarree())