How to pull in SNOTEL and CCSS data from this repository (and some use cases)¶

Eric Gagliano (egagli@uw.edu)
Updated: February 20th, 2024

Thanks for checking out this notebook! My hope is that this repository makes it easier to retrieve daily SNOTEL and CCSS data without having to do clunky downloads and conversions. Snow depth / SWE / PRCPSA are in meters, temperatures are in celsius. The only required packages you'll need are geopandas and pandas, the rest of the imports are for applications :)

In [1]:

import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import tqdm
import contextily as ctx

View all SNOTEL & CCSS stations¶

the SNOwpack TELemetry (SNOTEL) network includes over 800 automated weather stations in the Western U.S. for mountain snowpack observation
the CCSS program manages a network of 130 automated snow sensors located in the Szierra Nevada and Shasta-Trinity Mountains

Read the geojson stored at https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/all_stations.geojson ¶

the daily recurring github action should regularly update the endDate column
set the index of the geodataframe to the code column
let's only look at sites for which we have data

In [2]:

all_stations_gdf = gpd.read_file('https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/all_stations.geojson').set_index('code')
all_stations_gdf = all_stations_gdf[all_stations_gdf['csvData']==True]

In [3]:

all_stations_gdf

Out[3]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry
code
301_CA_SNTL	Adin Mtn	SNOTEL	1886.712036	41.235828	-120.791924	California	180200021403	10TFL	Great Basin Ranges	1983-10-01	2100-01-01	True	POINT (-120.79192 41.23583)
907_UT_SNTL	Agua Canyon	SNOTEL	2712.719971	37.522171	-112.271179	Utah	160300020301	12SUG	Colorado Plateau	1994-10-01	2100-01-01	True	POINT (-112.27118 37.52217)
916_MT_SNTL	Albro Lake	SNOTEL	2529.840088	45.597229	-111.959023	Montana	100200050701	12TVR	Central Montana Rocky Mountains	1996-09-01	2100-01-01	True	POINT (-111.95902 45.59723)
1267_AK_SNTL	Alexander Lake	SNOTEL	48.768002	61.749668	-150.889664	Alaska	190205051106	05VPJ	NaN	2014-08-28	2100-01-01	True	POINT (-150.88966 61.74967)
908_WA_SNTL	Alpine Meadows	SNOTEL	1066.800049	47.779572	-121.698471	Washington	171100100501	10TET	Cascade Range	1994-09-01	2100-01-01	True	POINT (-121.69847 47.77957)
...	...	...	...	...	...	...	...	...	...	...	...	...	...
SLT	Slate Creek	CCSS	1737.360000	41.043980	-122.480103	California	180200050304	10TEL	Klamath Mountains	2004-10-01	2024-02-12	True	POINT (-122.48010 41.04398)
SLI	Slide Canyon	CCSS	2804.160000	38.091234	-119.431881	California	180400090501	11SKC	Sierra Nevada	2005-10-01	2024-02-20	True	POINT (-119.43188 38.09123)
SLK	South Lake	CCSS	2926.080000	37.175903	-118.562660	California	180901020601	11SLB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.56266 37.17590)
STL	State Lakes	CCSS	3169.920000	36.926483	-118.573250	California	180300100305	11SLA	Sierra Nevada	2018-11-01	2024-02-20	True	POINT (-118.57325 36.92648)
TMR	Tamarack Summit	CCSS	2301.240000	37.163750	-119.200531	California	180400060903	11SLB	Sierra Nevada	2011-01-01	2024-02-20	True	POINT (-119.20053 37.16375)

969 rows × 13 columns

Use geopandas `GeoDataFrame.explore()` on the `all_stations_gdf` geodataframe to interactively view the stations¶

color by network: red is SNOTEL, blue is CCSS.

In [4]:

all_stations_gdf.astype(dict(beginDate=str, endDate=str)).explore(column='network',cmap='bwr')

Out[4]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Read a singular CSV: Is our winter on track with the historical record at the Paradise, WA SNOTEL station?¶

check out information about the SNOTEL station near Mt. Rainier at Paradise, WA
cool plots available at the Northwest River Forecast Center website

Place a station code (which you can find in this interactive plot, or by other means) in the url: https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/%7Bstation_id%7D.csv ¶

for SNOTEL stations, this will be of the form {unique number}_{two letter state abbreviation}_SNTL (e.g. 679_WA_SNTL).
for CCSS stations, this will be a three letter code (e.g. BLK).
use pd.read_csv() with index_col='datetime' and parse_dates=True so we interpret the datetime column as pandas datetime objects

In [5]:

station_id = '679_WA_SNTL'
paradise_snotel = pd.read_csv(f'https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/{station_id}.csv',index_col='datetime', parse_dates=True)

In [6]:

paradise_snotel

Out[6]:

	TAVG	TMIN	TMAX	SNWD	WTEQ	PRCPSA
datetime
1980-10-01	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-02	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-03	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-04	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-05	NaN	NaN	NaN	NaN	0.0000	0.0000
...	...	...	...	...	...	...
2024-02-16	-5.1	-7.5	-2.5	2.2860	1.0033	0.0076
2024-02-17	-2.6	-4.8	-0.4	2.3368	1.0109	0.0025
2024-02-18	0.6	-2.1	2.6	2.2860	1.0135	0.0152
2024-02-19	1.3	-0.2	2.8	2.3368	1.0287	0.0025
2024-02-20	NaN	NaN	NaN	2.3114	1.0312	NaN

15848 rows × 6 columns

Try a simple plot of snow depth and SWE¶

select the column of interest and use pandas built in Series.plot()

In [7]:

f,ax=plt.subplots(figsize=(12,5))

paradise_snotel['SNWD'].plot(ax=ax,label='snow depth')
paradise_snotel['WTEQ'].plot(ax=ax,label='snow water equivalent')

ax.set_xlim(pd.to_datetime(['2017-10-01','2018-09-30']))

ax.grid()
ax.legend()

ax.set_xlabel('time')
ax.set_ylabel('snow depth / SWE [meters]')
ax.set_title('Snow depth and SWE at Paradise, WA \n(water year 2018)')

f.tight_layout()

Try a more complex plot that shows current snow depth against statistics calculated from the entire time series for each day of water year¶

water year is conceptual 12 month period used to describe when the bulk of precipitation falls, mostly used for hydrology attribution
- in the northern hemisphere, we usually define the water year to start October 1st and go until September 30th (e.g. water year 2017: October 1st, 2016 - September 30th, 2017)
- so October 1st is DOWY 1
try a function like datetime_to_DOWY() shown below to convert datetimes to day of water year and add a dedicated DOWY column
- this function should account for leap years
then use pandas groupby functionality to calculate stats per DOWY
plot these stats
- thanks David Shean for the plot inspiration!

In [8]:

def datetime_to_DOWY(date):
    try:
        if date.month < 10 or (date.month == 10 and date.day < 1):
            start_of_water_year = pd.Timestamp(year=date.year-1, month=10, day=1)
        else:
            start_of_water_year = pd.Timestamp(year=date.year, month=10, day=1)
        return (date - start_of_water_year).days + 1
    except:
        return np.nan

def datetime_to_WY(date):
    if date.month < 10 or (date.month == 10 and date.day < 1):
        return date.year
    else:
        return date.year + 1

In [9]:

paradise_snotel['DOWY'] = paradise_snotel.index.map(datetime_to_DOWY)

In [10]:

stat_list = ['min','max','mean','std','median']
paradise_snotel_DOWY_snwd_stats = paradise_snotel.groupby('DOWY').agg(stat_list)['SNWD']
paradise_snotel_DOWY_snwd_stats

Out[10]:

	min	max	mean	std	median
DOWY
1	0.0	0.2286	0.014111	0.053862	0.0
2	0.0	0.2032	0.011289	0.047895	0.0
3	0.0	0.2286	0.014111	0.053862	0.0
4	0.0	0.1270	0.012700	0.032888	0.0
5	0.0	0.1270	0.012700	0.034022	0.0
...	...	...	...	...	...
362	0.0	0.0254	0.001411	0.005987	0.0
363	0.0	0.0254	0.001411	0.005987	0.0
364	0.0	0.0254	0.002822	0.008214	0.0
365	0.0	0.0762	0.005644	0.018595	0.0
366	0.0	0.0000	0.000000	0.000000	0.0

366 rows × 5 columns

In [11]:

today = datetime.datetime.today().strftime('%Y-%m-%d')
current_WY = slice(f'{int(today[0:4])-1}-10-01',f'{today}')
current_WY_paradise_snotel = paradise_snotel[current_WY.start:current_WY.stop]

In [12]:

f,ax=plt.subplots(figsize=(12,7))

for stat,stat_color in zip(['min','max','mean','median'],['red','blue','mediumpurple','mediumseagreen']):
    ax.plot(paradise_snotel_DOWY_snwd_stats.index, paradise_snotel_DOWY_snwd_stats[stat], label=stat, color=stat_color, linewidth=3)
    
ax.fill_between(paradise_snotel_DOWY_snwd_stats.index, paradise_snotel_DOWY_snwd_stats['mean'] - paradise_snotel_DOWY_snwd_stats['std'], paradise_snotel_DOWY_snwd_stats['mean'] + paradise_snotel_DOWY_snwd_stats['std'], color='mediumpurple', alpha=0.3, label='mean +/- 1 std')

ax.scatter(current_WY_paradise_snotel.DOWY,current_WY_paradise_snotel.SNWD, marker='o', color= 'black', label='Current WY')

ax.set_xlim([0,366])
ax.set_ylim([0,6])

ax.grid()
ax.legend()

ax.set_title(f'Current snow depth against historical snow depth stats by DOWY at Paradise, WA\n({paradise_snotel.index.min().date()} - {paradise_snotel.index.max().date()})')
ax.set_xlabel('Day of Water Year [Oct 1 - Sept 30]')
ax.set_ylabel('Snow depth [meters]')
f.tight_layout()

Looks like we're below the mean for snow depth for today's DOWY.

Read a variable from multiple CSVs by looping over a list of stations: Does the SNOTEL network and CCSS network list the same station?¶

no controversy here, but i noticed that a station name was repeated
perhaps both networks have data from the same station accessible
might make sense if they are co-managed in some way?

Create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶

initialize with an empty dict and append to it in the loop

In [13]:

station_list = ['356_CA_SNTL','BLK']

station_dict = {}

for station in station_list:
    tmp = pd.read_csv(f'https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/{station}.csv',index_col='datetime',parse_dates=True)['WTEQ']
    station_dict[station] = tmp

stations_swe_df = pd.DataFrame.from_dict(station_dict)#.dropna()

In [14]:

stations_swe_df

Out[14]:

	356_CA_SNTL	BLK
datetime
1980-07-09	NaN	NaN
1980-07-10	NaN	NaN
1980-07-11	NaN	NaN
1980-07-12	NaN	NaN
1980-07-13	NaN	NaN
...	...	...
2024-02-16	0.4064	0.4064
2024-02-17	0.4089	0.4089
2024-02-18	0.4166	0.4166
2024-02-19	0.4267	0.4267
2024-02-20	0.4902	0.4902

15932 rows × 2 columns

Plot the two stations on the same axis¶

if we properly set the index as the datetime, and we used parse_dates=True, these should line up!

In [15]:

f,ax=plt.subplots(figsize=(20,7))

stations_swe_df.plot(ax=ax,color=['red','blue'])

ax.legend()

ax.set_xlabel('time')
ax.set_ylabel('snow water equivalent [meters]')
ax.set_title('SNOTEL and CCSS SWE at Blue Lakes, CA \nmaybe these are the same? :)')

f.tight_layout()

#ax.set_xlim(['2018-08-01','2020-01-01'])

These look oddly similar... let's check out their correlation¶

convenient built in DataFrame.corr()

In [16]:

stations_swe_df.corr()

Out[16]:

	356_CA_SNTL	BLK
356_CA_SNTL	1.000000	0.999053
BLK	0.999053	1.000000

These correlation values, along with the time series above, makes me think these are way too similar... no way these would agree this much even if the stations were right next to each other!

Let's see where they exist spatially¶

select the stations by index and reproject to UTM 11N
use contextily for a basemap

In [17]:

f,ax=plt.subplots(figsize=(7,7))

all_stations_gdf[all_stations_gdf.index=='356_CA_SNTL'].to_crs('EPSG:32611').plot(ax=ax, color='red',label='356_CA_SNTL')
all_stations_gdf[all_stations_gdf.index=='BLK'].to_crs('EPSG:32611').plot(ax=ax, color='blue',label='BLK')

ax.set_xlim([244200,245700])
ax.set_ylim([4276900,4278700])

ctx.add_basemap(ax, crs='EPSG:32611', source=ctx.providers.Esri.WorldImagery)

ax.legend()

f.tight_layout()

Interesting locations :) Based on correlation and location, I'm going to say these are the same! Wonder what those tiny differences are about...

Look's like some of the CCSS stations have "Natural Resources Conservation Service" as their operator¶

Let's check which stations these are!
Let's plot these shared stations in red, and not shared in blue

In [18]:

!pip install -q lxml
import requests 
from io import StringIO

csv = 'https://cdec.water.ca.gov/misc/SnowSensors.html'
response = requests.get(csv)
same_stations_df = pd.read_html(StringIO(response.content.decode('utf-8')))[0].set_index('ID').sort_index()
same_stations_df = same_stations_df[same_stations_df.nunique(axis=1) > 1]
same_stations_gdf = gpd.GeoDataFrame(same_stations_df, geometry=gpd.points_from_xy(same_stations_df['Longitude'], same_stations_df['Latitude']))
same_stations_gdf.crs = "EPSG:4326"
same_stations_gdf = same_stations_gdf[same_stations_gdf['Operator Agency']=='Natural Resources Conservation Service']
same_stations_gdf

Out[18]:

	Name	Elev (feet)	Latitude	Longitude	April 1 Avg (inches)	Operator Agency	geometry
ID
ADM	ADIN MOUNTAIN	6200	41.237	-120.792	13.6	Natural Resources Conservation Service	POINT (-120.79200 41.23700)
BLK	BLUE LAKES	8000	38.613	-119.931	33.1	Natural Resources Conservation Service	POINT (-119.93100 38.61300)
BMW	BIG MEADOWS (SCS)	8700	39.458	-119.946	25.7	Natural Resources Conservation Service	POINT (-119.94600 39.45800)
BSK	BURNSIDE LAKE	8129	38.719	-119.894	-9999.0	Natural Resources Conservation Service	POINT (-119.89400 38.71900)
CDP	CEDAR PASS	7100	41.583	-120.303	18.1	Natural Resources Conservation Service	POINT (-120.30300 41.58300)
CSL	CENT SIERRA SNOW LAB	6900	39.325	-120.367	33.6	Natural Resources Conservation Service	POINT (-120.36700 39.32500)
CWF	CROWDER FLAT	5100	41.893	-120.752	-9999.0	Natural Resources Conservation Service	POINT (-120.75200 41.89300)
CXS	CARSON PASS	8353	38.692	-120.002	-9999.0	Natural Resources Conservation Service	POINT (-120.00200 38.69200)
DSS	DISMAL SWAMP	7050	41.993	-120.165	29.2	Natural Resources Conservation Service	POINT (-120.16500 41.99300)
EBB	EBBETTS PASS	8700	38.561	-119.808	38.8	Natural Resources Conservation Service	POINT (-119.80800 38.56100)
EP5	ECHO PEAK 5	7800	38.849	-120.079	39.5	Natural Resources Conservation Service	POINT (-120.07900 38.84900)
FLL	FALLEN LEAF LAKE	6250	38.932	-120.056	7.0	Natural Resources Conservation Service	POINT (-120.05600 38.93200)
HGM	HAGANS MEADOW	8000	38.853	-119.940	16.5	Natural Resources Conservation Service	POINT (-119.94000 38.85300)
HVN	HEAVENLY VALLEY	8800	38.929	-119.917	28.1	Natural Resources Conservation Service	POINT (-119.91700 38.92900)
IDC	INDEPENDENCE CAMP	7000	39.453	-120.299	21.8	Natural Resources Conservation Service	POINT (-120.29900 39.45300)
IDP	INDEPENDENCE LAKE (SCS)	8450	39.435	-120.322	41.4	Natural Resources Conservation Service	POINT (-120.32200 39.43500)
INN	INDEPENDENCE CREEK	6500	39.494	-120.293	12.7	Natural Resources Conservation Service	POINT (-120.29300 39.49400)
LBD	LOBDELL LAKE	9200	38.440	-119.377	17.3	Natural Resources Conservation Service	POINT (-119.37700 38.44000)
LVM	LEAVITT MEADOWS	7200	38.305	-119.552	8.0	Natural Resources Conservation Service	POINT (-119.55200 38.30500)
LVT	LEAVITT LAKE	9600	38.282	-119.621	-9999.0	Natural Resources Conservation Service	POINT (-119.62100 38.28200)
MNT	MONITOR PASS	8350	38.670	-119.615	-9999.0	Natural Resources Conservation Service	POINT (-119.61500 38.67000)
MRL	MARLETTE LAKE	8000	39.173	-119.905	21.1	Natural Resources Conservation Service	POINT (-119.90500 39.17300)
MSK	MOUNT ROSE SKI AREA	8900	39.326	-119.902	38.5	Natural Resources Conservation Service	POINT (-119.90200 39.32600)
PSN	POISON FLAT	7900	38.501	-119.631	16.2	Natural Resources Conservation Service	POINT (-119.63100 38.50100)
RP2	RUBICON PEAK 2	7500	39.001	-120.140	29.1	Natural Resources Conservation Service	POINT (-120.14000 39.00100)
SDW	SUMMIT MEADOW	9313	38.398	-119.536	-9999.0	Natural Resources Conservation Service	POINT (-119.53600 38.39800)
SPS	SONORA PASS BRIDGE	8750	38.318	-119.601	26.0	Natural Resources Conservation Service	POINT (-119.60100 38.31800)
SPT	SPRATT CREEK	6150	38.666	-119.817	4.5	Natural Resources Conservation Service	POINT (-119.81700 38.66600)
SQV	SQUAW VALLEY GOLD COAST	8200	39.194	-120.276	46.5	Natural Resources Conservation Service	POINT (-120.27600 39.19400)
TCC	TAHOE CITY CROSS	6750	39.171	-120.155	16.0	Natural Resources Conservation Service	POINT (-120.15500 39.17100)
TK2	TRUCKEE 2	6400	39.300	-120.194	14.3	Natural Resources Conservation Service	POINT (-120.19400 39.30000)
VRG	VIRGINIA LAKES RIDGE	9300	38.077	-119.234	20.3	Natural Resources Conservation Service	POINT (-119.23400 38.07700)
WC3	WARD CREEK 3	6750	39.136	-120.219	39.4	Natural Resources Conservation Service	POINT (-120.21900 39.13600)

In [19]:

f,ax=plt.subplots(figsize=(10,10))

same_stations_gdf.to_crs("EPSG:32611").plot(ax=ax,color='red',label='CCSS station operated by NRCS')
ccss_stations_gdf = all_stations_gdf[all_stations_gdf['network']=='CCSS'].to_crs("EPSG:32611")
ccss_stations_gdf[~ccss_stations_gdf.index.isin(same_stations_gdf.index)].plot(ax=ax,color='blue',label='CCSS station not operated by NRCS')

ctx.add_basemap(ax, crs='EPSG:32611', source=ctx.providers.Esri.WorldImagery)

ax.legend()

f.tight_layout()

There are 33 of these! These are likely also listed as SNOTEL stations...

Read a variable from multiple CSVs by looping over a subset of the geodataframe: How extraordinary was the California 2023 snowpack?¶

the Sierra Nevada received a historic amount of snow in 2023
let's explore the magnitude of this season by comparing to the median snow pack

As before, create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶

create a geodataframe ccss_stations_gdf of only CCSS stations from all_stations_gdf by creating an index where network equals CCSS
loop through the CCSS stations and create a dataframe ccss_stations_snwd_df

In [20]:

ccss_stations_gdf = all_stations_gdf[all_stations_gdf['network']=='CCSS']

In [21]:

ccss_stations_gdf

Out[21]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry
code
FRW	Farewell Gap	CCSS	2895.60	36.415211	-118.578979	California	180300070202	11SLA	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.57898 36.41521)
BNK	Bonanza King	CCSS	1965.96	41.083118	-122.631271	California	180102110304	10TEL	Klamath Mountains	2011-01-03	2024-02-20	True	POINT (-122.63127 41.08312)
CDP	Cedar Pass	CCSS	2164.08	41.583000	-120.303000	California	180200020603	10TGM	Great Basin Ranges	NaT	NaT	True	POINT (-120.30300 41.58300)
CRL	Charlotte Lake	CCSS	3169.92	36.777523	-118.426010	California	180300100206	11SLA	Sierra Nevada	2004-10-01	2023-12-11	True	POINT (-118.42601 36.77752)
CHM	Chilkoot Meadow	CCSS	2179.32	37.408390	-119.492188	California	180400061101	11SKB	Sierra Nevada	2006-10-01	2024-02-20	True	POINT (-119.49219 37.40839)
...	...	...	...	...	...	...	...	...	...	...	...	...	...
SLT	Slate Creek	CCSS	1737.36	41.043980	-122.480103	California	180200050304	10TEL	Klamath Mountains	2004-10-01	2024-02-12	True	POINT (-122.48010 41.04398)
SLI	Slide Canyon	CCSS	2804.16	38.091234	-119.431881	California	180400090501	11SKC	Sierra Nevada	2005-10-01	2024-02-20	True	POINT (-119.43188 38.09123)
SLK	South Lake	CCSS	2926.08	37.175903	-118.562660	California	180901020601	11SLB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.56266 37.17590)
STL	State Lakes	CCSS	3169.92	36.926483	-118.573250	California	180300100305	11SLA	Sierra Nevada	2018-11-01	2024-02-20	True	POINT (-118.57325 36.92648)
TMR	Tamarack Summit	CCSS	2301.24	37.163750	-119.200531	California	180400060903	11SLB	Sierra Nevada	2011-01-01	2024-02-20	True	POINT (-119.20053 37.16375)

130 rows × 13 columns

In [22]:

%%time 
station_dict = {}

for station in tqdm.tqdm(ccss_stations_gdf.index):
    try:
        tmp = pd.read_csv(f'https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/{station}.csv',index_col='datetime',parse_dates=True)['SNWD']
        station_dict[station] = tmp
    except:
        print(f'failed to retrieve {station}')

ccss_stations_snwd_df = pd.DataFrame.from_dict(station_dict).dropna(how='all')

100%|██████████| 130/130 [00:36<00:00,  3.60it/s]

CPU times: user 8.12 s, sys: 593 ms, total: 8.72 s
Wall time: 37.2 s

In [23]:

ccss_stations_snwd_df

Out[23]:

	FRW	BNK	CDP	CRL	CHM	HHM	HNT	MUM	SDF	RTL	...	QUA	LLP	FOR	GEM	WTM	SLT	SLI	SLK	STL	TMR
datetime
2004-10-01	0.3048	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0000	NaN	...	NaN	0.2032	NaN	NaN	NaN	-0.0254	NaN	-0.0254	NaN	NaN
2004-10-02	0.3556	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0000	NaN	...	NaN	0.2540	NaN	NaN	NaN	0.0254	NaN	-0.0254	NaN	NaN
2004-10-03	0.2540	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0000	NaN	...	NaN	0.2540	NaN	NaN	NaN	0.0254	NaN	-0.0254	NaN	NaN
2004-10-04	0.3302	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0000	NaN	...	NaN	0.0254	NaN	NaN	NaN	0.0254	NaN	-0.0508	NaN	NaN
2004-10-05	0.3302	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.0000	NaN	...	NaN	0.2286	NaN	NaN	NaN	0.0254	NaN	-0.0508	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2024-02-16	1.3208	1.7780	NaN	NaN	1.6764	NaN	1.2700	NaN	2.3114	NaN	...	NaN	NaN	NaN	0.0	1.0922	NaN	1.9050	0.8128	1.2192	1.1684
2024-02-17	1.2954	1.7272	NaN	NaN	1.6510	NaN	1.2192	NaN	2.2606	NaN	...	NaN	3.4290	NaN	0.0	1.0668	NaN	1.8288	0.7366	1.1684	1.1430
2024-02-18	1.3208	1.8542	NaN	NaN	1.7018	NaN	4.4958	NaN	2.4384	NaN	...	NaN	3.6830	NaN	0.0	1.0922	NaN	1.9558	0.7366	1.1938	1.1684
2024-02-19	1.4224	2.0574	NaN	NaN	1.7018	NaN	1.2192	NaN	2.6162	NaN	...	NaN	4.0132	NaN	0.0	1.1176	NaN	2.1336	3.7846	1.2954	1.1176
2024-02-20	1.7272	2.4130	NaN	NaN	1.7272	NaN	1.2446	NaN	NaN	NaN	...	NaN	4.3180	NaN	0.0	1.4224	NaN	2.1082	1.0414	1.6510	1.2192

7075 rows × 130 columns

Let's check out the percent of normal snow depth for April 1st, 2023¶

let's add a DOWY column to ccss_stations_snwd_df like before, groupby DOWY, apply a median, and divide the observation on April 1st, 2023 by the DOWY 183 (April 1) median
- add these percent normal values back to ccss_stations_gdf
will need to do some slight cleaning to get rid of NaNs, Infs, physically impossible values...

In [24]:

ccss_stations_snwd_df['DOWY'] = ccss_stations_snwd_df.index.map(datetime_to_DOWY)

In [25]:

ccss_stations_gdf.loc[:,'april2023_percent_norm'] = 100*(ccss_stations_snwd_df['2023-04-01':'2023-04-01'].squeeze() / ccss_stations_snwd_df.groupby('DOWY').median().loc[183])

/srv/conda/envs/notebook/lib/python3.11/site-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)

In [26]:

ccss_stations_gdf = ccss_stations_gdf.dropna(subset='april2023_percent_norm')
ccss_stations_gdf = ccss_stations_gdf[ccss_stations_gdf['april2023_percent_norm']<7500]

View the values in order...¶

use .sort_values() function to get an idea of percent normal snow depth values
DataFrame.head() and DataFrame.tail() to see highest and lowest percent normal snow depth values

In [27]:

ccss_stations_gdf.sort_values('april2023_percent_norm',ascending=False).head()

Out[27]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry	april2023_percent_norm
code
FLL	Fallen Leaf Lake	CCSS	1905.0000	38.932000	-120.056000	California	160501010401	10SGJ	NaN	2004-10-01	2024-02-19	True	POINT (-120.05600 38.93200)	1000.000000
DPO	Devils Postpile	CCSS	2307.0312	37.629410	-119.084671	California	180400060402	11SLB	Sierra Nevada	2007-10-01	2024-02-20	True	POINT (-119.08467 37.62941)	647.272727
RCK	Rock Creek Lakes	CCSS	2956.5600	37.457275	-118.735023	California	180901020301	11SLB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.73502 37.45728)	566.666667
TMR	Tamarack Summit	CCSS	2301.2400	37.163750	-119.200531	California	180400060903	11SLB	Sierra Nevada	2011-01-01	2024-02-20	True	POINT (-119.20053 37.16375)	546.428571
CWD	Cottonwood Lakes	CCSS	3093.7200	36.483829	-118.177559	California	180901030404	11SLA	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.17756 36.48383)	473.076923

In [28]:

ccss_stations_gdf.sort_values('april2023_percent_norm',ascending=False).tail()

Out[28]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry	april2023_percent_norm
code
LLP	Lower Lassen Peak	CCSS	2541.4224	40.466602	-121.508110	California	180201560301	10TFK	Cascade Range	2004-10-01	2024-02-20	True	POINT (-121.50811 40.46660)	162.541806
GIN	Gin Flat	CCSS	2148.8400	37.766887	-119.774907	California	180400080306	11SKB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-119.77491 37.76689)	157.575758
TNY	Tenaya Lake	CCSS	2484.1200	37.837581	-119.449875	California	180400080107	11SKB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-119.44988 37.83758)	154.867257
TUN	Tunnel Guard Station	CCSS	2712.7200	36.366349	-118.289116	California	180300020101	11SLA	Sierra Nevada	2022-12-01	2024-02-20	True	POINT (-118.28912 36.36635)	100.000000
BGP	Big Pine Creek	CCSS	2987.0400	37.127815	-118.476967	California	180901020709	11SLB	Sierra Nevada	2004-10-01	2024-02-20	True	POINT (-118.47697 37.12781)	0.000000

Plot the percent of normal snow depths for April 1st, 2023¶

add elevation plot to the right with horizontal dashed line at 100% (normal)

In [29]:

f,ax=plt.subplots(1,2,figsize=(12,9),gridspec_kw={'width_ratios': [2, 1]})

ccss_stations_gdf.to_crs('EPSG:32611').plot(ax=ax[0], column='april2023_percent_norm',legend=True,vmin=0,vmax=500,cmap='gnuplot',edgecolor='black',s=100)

ctx.add_basemap(ax[0], crs='EPSG:32611', source=ctx.providers.Esri.WorldImagery, attribution='')

ax[0].set_title('station locations')


ax[1].scatter(ccss_stations_gdf.elevation_m,ccss_stations_gdf.april2023_percent_norm,c=ccss_stations_gdf.april2023_percent_norm,cmap='gnuplot',vmin=0,vmax=500,edgecolors='black',s=100)
ax[1].axhline(y=100,linestyle='--',color='black')

ax[1].set_title('station percent normal snow depth vs elevation')
ax[1].set_xlabel('elevation [m]')
ax[1].set_ylabel('percent of normal snow depth [%]')

f.suptitle(f'California percent normal snow depth for April 1st, 2023')
f.tight_layout()

Looks like a lot more snow than usual to me!

Read a variable from all CSVs by looping over the entire geodataframe: Has the date of maximum SWE changed in the Western US?¶

Snowmelt timing can be an important indicator of regional climate change, and the snowmelt timing of the Western U.S. is projected to shift earlier in the year by up to 1 month by 2050 (Barnett et al., 2005; Stewart, 2009), with a corresponding snowpack loss equivalent to a 25% decrease in streamflow from snowmelt (Siirila-Woodburn et al., 2021).
Let's explore trends in maximum SWE timing using SNOTEL and CCSS stations

This time loop through all stations and and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶

might take a minute to load in almost 1000 CSVs...
store SWE for all stations in all_stations_swe_df

In [30]:

%%time 
station_dict = {}

for station in tqdm.tqdm(all_stations_gdf.index):
    try:
        tmp = pd.read_csv(f'https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/{station}.csv',index_col='datetime',parse_dates=True)['WTEQ']
        station_dict[station] = tmp
    except:
        print(f'failed to retrieve {station}')

100%|██████████| 969/969 [04:28<00:00,  3.61it/s]

CPU times: user 56 s, sys: 3.71 s, total: 59.7 s
Wall time: 4min 28s

In [31]:

all_stations_swe_df = pd.DataFrame.from_dict(station_dict)
all_stations_swe_df

Out[31]:

	301_CA_SNTL	907_UT_SNTL	916_MT_SNTL	1267_AK_SNTL	908_WA_SNTL	1189_AK_SNTL	1062_AK_SNTL	1070_AK_SNTL	302_OR_SNTL	1000_OR_SNTL	...	QUA	LLP	FOR	GEM	WTM	SLT	SLI	SLK	STL	TMR
datetime
1909-04-13	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1954-12-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-03	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.1143	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2024-02-16	0.2235	0.1829	0.2184	0.1981	0.4724	0.127	0.2642	0.2946	0.2388	0.7214	...	NaN	1.4554	0.2438	NaN	0.3152	0.4968	0.4696	0.1478	0.3183	0.3546
2024-02-17	0.2438	0.1829	0.2184	0.1981	0.4775	0.127	0.2642	0.2946	0.2413	0.7290	...	NaN	1.4783	0.2438	NaN	0.3152	0.4999	0.4702	0.1478	0.3193	0.3546
2024-02-18	0.2565	0.1829	0.2210	0.2032	0.4801	0.127	0.2642	0.2972	0.2438	0.7417	...	NaN	1.5255	NaN	NaN	0.3152	0.5547	0.4867	0.1478	0.3228	0.3668
2024-02-19	0.2642	0.1829	0.2210	0.2032	0.4826	0.127	0.2642	0.2972	0.2413	0.7544	...	NaN	1.5926	NaN	NaN	0.3150	0.6157	0.5121	0.1600	0.3358	0.3912
2024-02-20	0.2692	0.1829	0.2210	0.2134	0.4826	0.127	0.2667	0.3023	0.2413	0.7645	...	NaN	1.6939	NaN	NaN	0.3569	0.7071	0.5283	0.1875	0.3927	0.4277

23800 rows × 969 columns

Prepare the data¶

prepare and clean all_stations_swe_df:
- filter to start in WY 1967 (the first year with more than one station) and end with WY 2023
- add water year column
- remove any negative SWE measurements
- for consistency with similar analyses, following the methodology of Evan 2019 and US EPA 2021:
  - set any change of greater magnitude than 20cm to NaN
  - if there are more than 30 days of missing data during November-April, don't use that water year
  - if SWE is zero during every day of Jan/Feb/March, don't use that water year
  - only use stations with continuous data from WY 1982
    - i do this only for the all data bulk calculation so we have a common reference frame, but for station level analysis i instead impose a 30 year or longer record rule

In [32]:

all_stations_swe_df = all_stations_swe_df.loc[slice('1966-10-01','2023-09-30')]

In [33]:

all_stations_swe_df['WY'] = all_stations_swe_df.index.map(datetime_to_WY)

/tmp/ipykernel_5958/4163226604.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_stations_swe_df['WY'] = all_stations_swe_df.index.map(datetime_to_WY)

In [34]:

all_stations_swe_df = all_stations_swe_df[all_stations_swe_df>=0] 

In [35]:

all_stations_swe_diff_df = all_stations_swe_df.diff().abs()
all_stations_swe_df[all_stations_swe_diff_df>0.20] = np.nan

In [36]:

def check_missing_data(group):
    nov_to_apr_mask = group.index.month.isin([11, 12, 1, 2, 3, 4])
    filtered_group = group[nov_to_apr_mask]
    missing_data_counts = filtered_group.isnull().sum()
    columns_to_nan = missing_data_counts[missing_data_counts > 30].index
    group[columns_to_nan] = np.nan
    return group

def check_zero_swe(group):
    for month in [1, 2, 3]:
        month_mask = group.index.month == month
        zero_swe_columns = group[month_mask].eq(0).all()
        columns_to_nan = zero_swe_columns[zero_swe_columns].index
        group[columns_to_nan] = np.nan
    return group

In [37]:

all_stations_swe_df = all_stations_swe_df.groupby('WY').apply(check_missing_data).droplevel(0)
all_stations_swe_df = all_stations_swe_df.groupby('WY').apply(check_zero_swe).droplevel(0)
all_stations_swe_df

Out[37]:

	301_CA_SNTL	907_UT_SNTL	916_MT_SNTL	1267_AK_SNTL	908_WA_SNTL	1189_AK_SNTL	1062_AK_SNTL	1070_AK_SNTL	302_OR_SNTL	1000_OR_SNTL	...	LLP	FOR	GEM	WTM	SLT	SLI	SLK	STL	TMR	WY
datetime
1966-10-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-03	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-05	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2023-09-26	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-27	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-28	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-29	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-30	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0

20763 rows × 970 columns

Calculate and plot trend in DOWY of max SWE for all data¶

use DataFrame.groupby() to find the date of max SWE per year and convert to a DOWY value, store in all_stations_dowy_max_swe_df
calculate slope and intercept of linear regression using np.polyfit() on a melted version of
obtain statistics for each year using DataFrame.describe()

In [38]:

all_stations_dowy_max_swe_df = all_stations_swe_df.groupby('WY').idxmax().applymap(datetime_to_DOWY)

In [39]:

stations_before_WY1982 = all_stations_gdf[all_stations_gdf.beginDate<'1981-10-01']
dowy_max_swe_melted = pd.melt(all_stations_dowy_max_swe_df.reset_index(),id_vars='WY').dropna()
dowy_max_swe_melted_before_WY1982 = dowy_max_swe_melted[dowy_max_swe_melted['variable'].isin(stations_before_WY1982.index)]
slope, intercept = np.polyfit(dowy_max_swe_melted_before_WY1982.WY,dowy_max_swe_melted_before_WY1982.value,1)
lr_years = np.unique(dowy_max_swe_melted.WY)

In [40]:

describe = all_stations_dowy_max_swe_df.T.describe()
describe

Out[40]:

WY	1967.0	1968.0	1969.0	1970.0	1971.0	1972.0	1973.0	1974.0	1975.0	1976.0	...	2014.0	2015.0	2016.0	2017.0	2018.0	2019.0	2020.0	2021.0	2022.0	2023.0
count	15.000000	23.00000	31.000000	39.000000	40.000000	40.000000	48.000000	50.000000	53.000000	55.000000	...	891.000000	870.000000	896.000000	926.000000	922.000000	929.000000	932.000000	928.000000	928.000000	920.000000
mean	211.600000	196.73913	185.741935	197.641026	188.050000	186.375000	195.729167	195.720000	213.000000	194.800000	...	179.783389	156.452874	175.334821	179.018359	183.367679	185.812702	182.918455	180.346983	176.734914	192.546739
std	13.113788	19.17199	14.906302	25.243537	29.203047	33.502918	18.525129	17.996644	11.560876	18.673015	...	28.288029	36.552523	27.632212	25.712310	18.666580	19.338004	26.485426	18.764201	36.686443	13.905714
min	188.000000	170.00000	158.000000	148.000000	106.000000	94.000000	134.000000	161.000000	187.000000	154.000000	...	55.000000	33.000000	88.000000	104.000000	84.000000	95.000000	60.000000	119.000000	89.000000	86.000000
25%	208.500000	179.50000	176.000000	167.000000	179.000000	159.500000	184.750000	186.250000	207.000000	184.500000	...	164.000000	133.000000	167.000000	159.000000	175.000000	170.000000	183.000000	175.000000	162.000000	187.000000
50%	215.000000	204.00000	182.000000	210.000000	191.500000	202.000000	197.000000	193.000000	216.000000	202.000000	...	188.000000	156.000000	181.000000	180.000000	180.000000	186.000000	190.000000	178.000000	174.000000	189.000000
75%	218.500000	209.00000	191.500000	212.500000	209.250000	211.250000	207.000000	200.000000	221.000000	211.000000	...	190.000000	177.000000	185.000000	202.000000	200.000000	199.000000	200.000000	189.250000	205.000000	205.000000
max	228.000000	238.00000	219.000000	227.000000	235.000000	226.000000	225.000000	238.000000	240.000000	218.000000	...	238.000000	239.000000	232.000000	235.000000	224.000000	247.000000	229.000000	228.000000	250.000000	234.000000

8 rows × 57 columns

In [41]:

f,ax=plt.subplots(2,1,figsize=(10,6),sharex=True,gridspec_kw={'height_ratios': [3, 2]})

describe.loc['50%'].plot(ax=ax[0],label='median')

ax[0].fill_between(describe.columns,describe.loc['25%'],describe.loc['75%'],alpha=0.3,label='IQR')
ax[0].plot(lr_years,np.array(lr_years)*slope+intercept,'k--',label=f'Trend (slope={slope:.2f} Days/Year)')
#ax[0].set_xlim([1967,2023])

ax[0].legend()

describe.loc['count'].plot(ax=ax[1])


ax[0].set_title('Trend in DOWY of max SWE')
ax[1].set_title('Number of active stations')

Out[41]:

Text(0.5, 1.0, 'Number of active stations')

Check out trend at each station seperately¶

calculate the linear trend in DOWY of max SWE only for stations with over 30 years of data, store in our original all_stations_gdf
- out of 971 stations with SWE data, 645 meet this criteria
plot the trend for each station and plot the trends on a histogram

In [42]:

all_stations_gdf.loc[:,'dowy_max_swe_trend'] = all_stations_dowy_max_swe_df.apply(lambda y: np.polyfit(y.dropna().index.values, y.dropna(), 1)[0] if len(y.dropna()) >= 30 else np.nan)

In [43]:

f,ax=plt.subplots(figsize=(10,5.5))

all_stations_gdf.plot(column='dowy_max_swe_trend',ax=ax,legend=True,cmap='RdBu_r',edgecolor='k',markersize=20,vmin=-1,vmax=1,legend_kwds={'label':'[days/year]\n(Red is later in the year, blue is earlier)'})

ctx.add_basemap(ax, crs='EPSG:4326', source=ctx.providers.Esri.WorldImagery, attribution='')

ax.set_title('Trend in DOWY of max SWE\n(only stations with 30+ years of data)')

Out[43]:

Text(0.5, 1.0, 'Trend in DOWY of max SWE\n(only stations with 30+ years of data)')

In [44]:

f,ax=plt.subplots()

ax.hist(all_stations_gdf['dowy_max_swe_trend'],bins=50)

ax.axvline(x=0,color='red')

ax.set_xlim([-1.5,1.5])

ax.set_xlabel('trend [days/year]')
ax.set_ylabel('count')
ax.set_title('Distribution of trends in DOWY of max SWE')

Out[44]:

Text(0.5, 1.0, 'Distribution of trends in DOWY of max SWE')

Let's analyze these trends by mountain range¶

we can DataFrame.groupby() our geodataframe by mountain range to calculate mountain range specific stats, store in mountain_range_trend_df
mountain ranges with more stations (and more spatial coverage) are probably more trustworthy

In [45]:

mountain_range_count = all_stations_gdf.dropna().groupby('mountainRange')['dowy_max_swe_trend'].count()
mountain_range_median = all_stations_gdf.dropna().groupby('mountainRange')['dowy_max_swe_trend'].median()
mountain_range_mean = all_stations_gdf.dropna().groupby('mountainRange')['dowy_max_swe_trend'].mean()
mountain_range_std = all_stations_gdf.dropna().groupby('mountainRange')['dowy_max_swe_trend'].std()

In [46]:

mountain_range_trend_df = pd.concat([mountain_range_count,mountain_range_median,mountain_range_mean,mountain_range_std],axis=1)
mountain_range_trend_df.columns = ['station_count','median','mean','std']

In [47]:

mountain_range_trend_df

Out[47]:

	station_count	median	mean	std
mountainRange
Alaska Intermountain Ranges	5	-0.298031	-0.266592	0.112821
Cascade Range	63	0.072125	0.138312	0.346018
Central Montana Rocky Mountains	48	-0.083318	-0.100606	0.193399
Colorado Plateau	37	-0.282213	-0.348372	0.205635
Columbia Mountains	7	0.143695	0.082937	0.164431
Columbia Plateau	24	-0.138161	-0.029927	0.309255
Great Basin Ranges	37	-0.209381	-0.238700	0.206061
Greater Yellowstone Rockies	51	-0.044327	-0.064588	0.175889
Idaho-Bitterroot Rocky Mountains	57	-0.036092	-0.021299	0.242944
Klamath Mountains	5	-0.122516	0.019894	0.407079
Olympic Mountains	1	0.267235	0.267235	NaN
Oregon Coast Range	1	0.163264	0.163264	NaN
Sierra Nevada	81	-0.177622	-0.176293	0.283223
South-Central Alaska	7	-0.073497	-0.029532	0.230334
Southern Rocky Mountains	90	-0.275773	-0.277181	0.258862
Southwest Basins and Ranges	4	-0.752793	-0.740468	0.252450
Western Rocky Mountains	58	-0.191897	-0.171475	0.199642

In [48]:

f,ax=plt.subplots(1,2,sharey=True,gridspec_kw={'width_ratios': [1, 3]})

mountain_range_trend_df['station_count'].plot.barh(ax=ax[0])
mountain_range_trend_df['median'].plot.barh(ax=ax[1],cmap='RdBu')


ax[0].set_xlabel('[#]')
ax[1].set_xlabel('[days/year]')
ax[0].set_ylabel('')
ax[0].set_title('station count')
ax[1].set_title('trend')

f.suptitle('Trend in DOWY of max SWE by mountain range')

Out[48]:

Text(0.5, 0.98, 'Trend in DOWY of max SWE by mountain range')

Let's visualize this on a map¶

add spatial information to mountain_range_trend_df using DataFrame.join() with mountain range geometries from GMBA Mountain Inventory v2
ceate both a static plot with counts and medians and create an interactive plot so we can explore the trends across mountain ranges and stations

In [49]:

url = f'https://data.earthenv.org/mountains/standard/GMBA_Inventory_v2.0_standard_300.zip'
gmba_gdf = gpd.read_file('zip+'+url)

In [50]:

mountain_range_trend_gdf = gpd.GeoDataFrame(mountain_range_trend_df.join(gmba_gdf[['MapName','geometry']].set_index('MapName')))

In [51]:

f,ax=plt.subplots(1,2,figsize=(14,7))

mountain_range_trend_gdf.plot(ax=ax[0],column='station_count',vmin=0,vmax=100,cmap='viridis',legend=True,edgecolor='k',legend_kwds={'label':'[#]'})
mountain_range_trend_gdf.plot(ax=ax[1],column='median',vmin=-0.3,vmax=0.3,cmap='RdBu_r',legend=True,edgecolor='k',legend_kwds={'label':'[days/year]\n(Red is later in the year, blue is earlier)'})

for axs in ax:
    ctx.add_basemap(ax=axs, crs=mountain_range_trend_gdf.crs, source=ctx.providers.Esri.WorldImagery, attribution=False)
    ctx.add_basemap(ax=axs, crs=mountain_range_trend_gdf.crs, source=ctx.providers.Esri.WorldImagery, attribution=False)
    axs.set_xlim([-125,-104])
    axs.set_ylim([27,55])
    
ax[0].set_title('count')
ax[1].set_title('trend')

f.suptitle('Trend in DOWY of max SWE by mountain range\n(only stations with 30+ years of data)')

Out[51]:

Text(0.5, 0.98, 'Trend in DOWY of max SWE by mountain range\n(only stations with 30+ years of data)')

In [52]:

m = mountain_range_trend_gdf.explore(column='median',cmap='RdBu_r',vmin=-0.5,vmax=0.5)
all_stations_gdf.astype(dict(beginDate=str, endDate=str)).explore(m=m,column='dowy_max_swe_trend',cmap='RdBu_r',vmin=-0.3,vmax=0.3)

Out[52]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Looks like these trends are different by region, but relatively consistent within region. The majority of regions show the timing of maximum SWE happening earlier in the year, with notable exceptions being mountain ranges in the Pacific Northwest which show a reverse trend with smaller magnitude. Also of importance is the number of stations and their spatial dsitribution in each region, as 2 of the 4 regions (Olympic Mountains and Oregon Coast Range) showing the timing of maximum SWE happening later in the year only have one station each with a 30+ year record.

In [ ]:

How to pull in SNOTEL and CCSS data from this repository (and some use cases)¶

View all SNOTEL & CCSS stations¶

Read the geojson stored at https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/all_stations.geojson¶

Use geopandas GeoDataFrame.explore() on the all_stations_gdf geodataframe to interactively view the stations¶

Read a singular CSV: Is our winter on track with the historical record at the Paradise, WA SNOTEL station?¶

Place a station code (which you can find in this interactive plot, or by other means) in the url: https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/%7Bstation_id%7D.csv¶

Try a simple plot of snow depth and SWE¶

Try a more complex plot that shows current snow depth against statistics calculated from the entire time series for each day of water year¶

Read a variable from multiple CSVs by looping over a list of stations: Does the SNOTEL network and CCSS network list the same station?¶

Create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using pd.DataFrame.from_dict()¶

Plot the two stations on the same axis¶

These look oddly similar... let's check out their correlation¶

Let's see where they exist spatially¶

Look's like some of the CCSS stations have "Natural Resources Conservation Service" as their operator¶

Read a variable from multiple CSVs by looping over a subset of the geodataframe: How extraordinary was the California 2023 snowpack?¶

As before, create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using pd.DataFrame.from_dict()¶

Let's check out the percent of normal snow depth for April 1st, 2023¶

View the values in order...¶

Plot the percent of normal snow depths for April 1st, 2023¶

Read a variable from all CSVs by looping over the entire geodataframe: Has the date of maximum SWE changed in the Western US?¶

This time loop through all stations and and add data to a dictionary with the station code as the key, then read into pandas using pd.DataFrame.from_dict()¶

Prepare the data¶

Calculate and plot trend in DOWY of max SWE for all data¶

Check out trend at each station seperately¶

Let's analyze these trends by mountain range¶

Let's visualize this on a map¶

Read the geojson stored at https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/all_stations.geojson ¶

Use geopandas `GeoDataFrame.explore()` on the `all_stations_gdf` geodataframe to interactively view the stations¶

Place a station code (which you can find in this interactive plot, or by other means) in the url: https://raw.githubusercontent.com/egagli/snotel_ccss_stations/main/data/%7Bstation_id%7D.csv ¶

Create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶

As before, create a list of the stations we are interested in, loop through and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶

This time loop through all stations and and add data to a dictionary with the station code as the key, then read into pandas using `pd.DataFrame.from_dict()`¶