Developed by: Georgii Bocharov

E-Mail: bocharovgeorgii@gmail.com

coastlib: https://github.com/georgebv/coastlib

License: GNU General Public License v3.0

Introduction

This notebook provides examples for the noaa_ncei module of the coastlib library. Source code for this module can be found here.

The noaa_ncei module is a part of the coastlib.data package. This module provides interface to the NOAA NCEI data portal via the NCEI Data Service API. It allows retrieval of environmental data collected by NOAA NCEI sensors such as wind, precipitation, air pressure, etc. in the form of pandas DataFrame. With the help of this tool one can automate extraction of large amounts of data from NOAA NCEI stations for further processing and storing.

Basic Usage

Let's start by extracting hourly wind speed data for the La Guardia and Bethel Airport stations for the October 2012 time period:

In [1]:
%matplotlib inline

import pandas as pd
from coastlib.data import ncei_api, ncei_api_batch, ncei_datasets, ncei_search


df = ncei_api(
    dataset='local-climatological-data', stations=['72503014732', '70219026615'],
    start_date='2012-10-01', end_date='2012-11-01', datatypes=['HourlyWindSpeed', 'HourlyWindDirection']
)
df
Out[1]:
REPORT_TYPE HourlyWindDirection LONGITUDE ELEVATION SOURCE HourlyWindSpeed LATITUDE
STATION DATE
72503014732 : LA GUARDIA AIRPORT, NY US 2012-10-01 00:51:00 FM-15 270 -73.88 3.4 7 8 40.7792
2012-10-01 01:00:00 FM-12 270 -73.88 3.4 4 8 40.7792
2012-10-01 01:51:00 FM-15 290 -73.88 3.4 7 13 40.7792
2012-10-01 02:51:00 FM-15 260 -73.88 3.4 7 11 40.7792
2012-10-01 03:51:00 FM-15 250 -73.88 3.4 7 10 40.7792
... ... ... ... ... ... ... ... ...
70219026615 : BETHEL AIRPORT, AK US 2012-11-01 21:00:00 FM-12 010 -161.8293 31.1 4 19 60.785
2012-11-01 21:53:00 FM-15 010 -161.8293 31.1 7 21 60.785
2012-11-01 22:53:00 FM-15 010 -161.8293 31.1 7 25 60.785
2012-11-01 23:53:00 FM-15 010 -161.8293 31.1 7 24 60.785
2012-11-01 23:59:00 SOD NaN -161.8293 31.1 6 NaN 60.785

2427 rows × 7 columns

It should be noted that the extracted data is not numeric. This was done explicitly in order to allow for extraction of all data provided by NOAA (they mix datatypes in the responses) with specific processing methods left to the end user.

In order to do something meaningful with the data, some processing is required:

In [2]:
df_processed = (
    df
    .loc[:, ['HourlyWindSpeed', 'HourlyWindDirection']]
    .apply(pd.to_numeric, errors='coerce')
    .dropna(how='any')
)
df_processed
Out[2]:
HourlyWindSpeed HourlyWindDirection
STATION DATE
72503014732 : LA GUARDIA AIRPORT, NY US 2012-10-01 00:51:00 8.0 270.0
2012-10-01 01:00:00 8.0 270.0
2012-10-01 01:51:00 13.0 290.0
2012-10-01 02:51:00 11.0 260.0
2012-10-01 03:51:00 10.0 250.0
... ... ... ...
70219026615 : BETHEL AIRPORT, AK US 2012-11-01 20:53:00 20.0 10.0
2012-11-01 21:00:00 19.0 10.0
2012-11-01 21:53:00 21.0 10.0
2012-11-01 22:53:00 25.0 10.0
2012-11-01 23:53:00 24.0 10.0

2329 rows × 2 columns

In [3]:
df_processed.loc['72503014732 : LA GUARDIA AIRPORT, NY US', 'HourlyWindSpeed'].plot(color='k', lw=.5)
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a323ecd488>
In [4]:
df_processed.loc['70219026615 : BETHEL AIRPORT, AK US', 'HourlyWindSpeed'].plot(
    kind='hist', facecolor='None', edgecolor='k', lw=1, bins=15
)
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a323fe5e08>

The ncei_api is the most direct way of interfacing with the NOAA NCEI API. Its drawback is that it allows for extraction of data for only a relatively small window (depends on data source, not clearly documented by NOAA - largest window apeears to be ~10 years). Serious applications require many years of data. That's where the ncei_api_batch function becomes much more convenient.

Batch Data Extraction

Let's get 2 years of wind data for the La Guardia station (it should be noted that station ID is not WBAN, but a USAF number - it can be found in pdf printouts for the station or via the NCEI Search API as shown further in this notebook):

In [5]:
wind = ncei_api_batch(
        dataset='local-climatological-data', stations='72503014732',
        start_date='2010-01-01', end_date='2011-01-01', time_delta='1Y',
    datatypes=['HourlyWindSpeed', 'HourlyWindDirection']
    )
wind
Out[5]:
REPORT_TYPE HourlyWindDirection LONGITUDE ELEVATION SOURCE HourlyWindSpeed LATITUDE
STATION DATE
72503014732 : LA GUARDIA AIRPORT, NY US 2010-01-01 00:51:00 FM-15 060 -73.88 3.4 7 3 40.7792
2010-01-01 01:00:00 FM-12 060 -73.88 3.4 4 3 40.7792
2010-01-01 01:04:00 FM-16 000 -73.88 3.4 7 0 40.7792
2010-01-01 01:37:00 FM-16 000 -73.88 3.4 7 0 40.7792
2010-01-01 01:47:00 FM-16 000 -73.88 3.4 7 0 40.7792
... ... ... ... ... ... ... ...
2011-01-01 01:51:00 FM-15 200 -73.88 3.4 7 8 40.7792
2011-01-01 02:51:00 FM-15 200 -73.88 3.4 7 6 40.7792
2011-01-01 03:51:00 FM-15 200 -73.88 3.4 7 5 40.7792
2011-01-01 04:00:00 FM-12 200 -73.88 3.4 4 5 40.7792
2011-01-01 04:51:00 FM-15 180 -73.88 3.4 7 3 40.7792

13482 rows × 7 columns

This output is exactly the same one as for the ncei_api function, except that the DataFrame is much longer. Similar processing is required:

In [6]:
wind_processed = (
    wind
    .loc[:, ['HourlyWindSpeed', 'HourlyWindDirection']]
    .apply(pd.to_numeric, errors='coerce')
    .dropna(how='any')
)
wind_processed
Out[6]:
HourlyWindSpeed HourlyWindDirection
STATION DATE
72503014732 : LA GUARDIA AIRPORT, NY US 2010-01-01 00:51:00 3.0 60.0
2010-01-01 01:00:00 3.0 60.0
2010-01-01 01:04:00 0.0 0.0
2010-01-01 01:37:00 0.0 0.0
2010-01-01 01:47:00 0.0 0.0
... ... ...
2011-01-01 01:51:00 8.0 200.0
2011-01-01 02:51:00 6.0 200.0
2011-01-01 03:51:00 5.0 200.0
2011-01-01 04:00:00 5.0 200.0
2011-01-01 04:51:00 3.0 180.0

12846 rows × 2 columns

In [7]:
wind_processed.loc[:, 'HourlyWindSpeed'].plot(kind='hist', facecolor='None', edgecolor='k', lw=1, bins=20)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a3259f5508>

Additional Functions

The coastlib.data.noaa_ncei module has additional functions ncei_datasets and ncei_search. These functions are useful when exploring what information is available for specific regions and what keywords should be used in the ncei_api and ncei_api_batch functions.

NCEI Datasets

The ncei_datasets function should be used to retrieve available NOAA NCEI formats, datasets, and datatypes for given dates, geographical extents, and keywords. Unfortunately, these appear to not be working properly at this moment since the NCEI API is not fully developed and is not properly documented. One way to use it would be as follows:

In [8]:
datasets = ncei_datasets()
print('Formats:', datasets[0], end='\n\n')
print('Datasets (first 3):', datasets[1][:3], end='\n\n')
print('Datatypes:', datasets[2].keys())
Formats: ['csv', 'json', 'native', 'netcdf', 'pdf']

Datasets (first 3): ['avhrr-cloud-properties-patmosx', 'avhrr-reflectance-patmosx', 'daily-summaries']

Datatypes: dict_keys(['global-summary-of-the-month', 'global-summary-of-the-year', 'global-hourly', 'daily-summaries', 'global-marine', 'global-summary-of-the-day', 'mean-layer-temperature-uah', 'avhrr-reflectance-patmosx', 'local-climatological-data', 'normals-annualseasonal'])

The ncei_search function should be used to retrieve available NOAA NCEI formats, datasets, and datatypes for given parameters (station IDs, geographical extents, keywords, etc.). Unfortunately, as with the ncei_datasets, the NCEI Search API is not documented (current NOAA documentation is completely wrong - their description don't match examples, I had to manually play with different parameters to find out how it works). In addition to absense of any documentation, this API is also broken - certain options have no effect (such as geographical extent). It is unknown when/if NOAA will update it.

So far, the only meaningful way to use this is to get metadata for known stations:

In [9]:
ns_data = ncei_search(dataset='local-climatological-data', stations='72503014732', limit=5)
print('Datatypes (first 5):', ns_data[0][:5], end='\n\n')
print('Stations:', ns_data[1], end='\n\n')
print('Results:', ns_data[2]['72503014732'])
Datatypes (first 5): ['AWND', 'CDSD', 'CLDD', 'DSNW', 'DailyAverageDewPointTemperature']

Stations: ['72503014732']

Results: {'name': 'LA GUARDIA AIRPORT, NY US', 'datatypes': {'Short Duration Precipitation Value 150': 'ShortDurationPrecipitationValue150', 'REPORT_TYPE': 'REPORT_TYPE', 'Short Duration Precipitation Value 030': 'ShortDurationPrecipitationValue030', 'Monthly Sea Level Pressure': 'MonthlySeaLevelPressure', 'Monthly Max Sea Level Pressure Value Date': 'MonthlyMaxSeaLevelPressureValueDate', 'Hourly Sea Level Pressure': 'HourlySeaLevelPressure', 'Daily Maximum Dry Bulb Temperature': 'DailyMaximumDryBulbTemperature', 'Monthly Days With Temperature  > 32 Degrees': 'MonthlyDaysWithGT32Temp', 'Monthly Days With Temperature  < 32 Degrees': 'MonthlyDaysWithLT32Temp', 'Monthly Max Sea Level Pressure Value Time': 'MonthlyMaxSeaLevelPressureValueTime', 'Monthly Greatest Precip Date': 'MonthlyGreatestPrecipDate', 'Hourly Precipitation': 'HourlyPrecipitation', 'Daily Peak Wind Speed': 'DailyPeakWindSpeed', 'Monthly Station Pressure': 'MonthlyStationPressure', 'Daily Peak Wind Direction': 'DailyPeakWindDirection', 'Short Duration End Date 100': 'ShortDurationEndDate100', 'Monthly Departure From Normal Maximum Temperature': 'MonthlyDepartureFromNormalMaximumTemperature', 'Daily Average Relative Humidity': 'DailyAverageRelativeHumidity', 'Monthly Maximum Temperature': 'MonthlyMaximumTemperature', 'SOURCE': 'SOURCE', 'Hourly Pressure Tendency': 'HourlyPressureTendency', 'Short Duration End Date 060': 'ShortDurationEndDate060', 'Short Duration End Date 180': 'ShortDurationEndDate180', 'Short Duration Precipitation Value 020': 'ShortDurationPrecipitationValue020', 'Long-term Averages of Monthly Cooling Degree Days with Base 65F': 'NormalsCoolingDegreeDay', 'Monthly Departure From Normal Minimum Temperature': 'MonthlyDepartureFromNormalMinimumTemperature', 'Monthly Greatest Snowfall Date': 'MonthlyGreatestSnowfallDate', 'Hourly Pressure Change': 'HourlyPressureChange', 'Daily Sustained Wind Speed': 'DailySustainedWindSpeed', 'Short Duration Precipitation Value 015': 'ShortDurationPrecipitationValue015', 'Monthly Days With > 0.001 Precip': 'MonthlyDaysWithGT001Precip', 'Short Duration Precipitation Value 010': 'ShortDurationPrecipitationValue010', 'Short Duration Precipitation Value 080': 'ShortDurationPrecipitationValue080', 'Daily Snow Depth': 'DailySnowDepth', 'Monthly Days With Temperature  > 90 Degrees': 'MonthlyDaysWithGT90Temp', 'Daily Average Wind Speed': 'DailyAverageWindSpeed', 'Hourly Altimeter Setting': 'HourlyAltimeterSetting', 'Heating Degree Days Season to Date': 'HDSD', 'Cooling Degree Days Season to Date': 'CDSD', 'Average Wind Speed for the Month': 'AWND', 'Short Duration End Date 045': 'ShortDurationEndDate045', 'Monthly Departure From Normal Average Temperature': 'MonthlyDepartureFromNormalAverageTemperature', 'Short Duration Precipitation Value 005': 'ShortDurationPrecipitationValue005', 'Sunrise': 'Sunrise', 'REM': 'REM', 'Monthly Departure From Normal Precipitation': 'MonthlyDepartureFromNormalPrecipitation', 'Departure from Long-term Averages of Monthly Heating Degree Days with Base 65F': 'MonthlyDepartureFromNormalHeatingDegreeDays', 'Monthly Max Sea Level Pressure Value': 'MonthlyMaxSeaLevelPressureValue', 'Short Duration Precipitation Value 120': 'ShortDurationPrecipitationValue120', 'Monthly Greatest Snow Depth': 'MonthlyGreatestSnowDepth', 'Monthly Minimum Temperature': 'MonthlyMinimumTemperature', 'Daily Precipitation': 'DailyPrecipitation', 'Daily Average Sea Level Pressure': 'DailyAverageSeaLevelPressure', 'Hourly Wind Speed': 'HourlyWindSpeed', 'Monthly Days With Temperature  < 0 Degrees': 'MonthlyDaysWithLT0Temp', 'Hourly Relative Humidity': 'HourlyRelativeHumidity', 'Short Duration End Date 150': 'ShortDurationEndDate150', 'Hourly Present Weather Type': 'HourlyPresentWeatherType', 'Monthly Min Sea Level Pressure Value': 'MonthlyMinSeaLevelPressureValue', 'Daily Departure From Normal Average Temperature': 'DailyDepartureFromNormalAverageTemperature', 'Short Duration End Date 030': 'ShortDurationEndDate030', 'Short Duration Precipitation Value 060': 'ShortDurationPrecipitationValue060', 'Monthly Greatest Snowfall': 'MonthlyGreatestSnowfall', 'Short Duration Precipitation Value 180': 'ShortDurationPrecipitationValue180', 'Heating Degree Days': 'HTDD', 'Hourly Wet Bulb Temperature': 'HourlyWetBulbTemperature', 'Departure from Long-term Averages of Monthly Cooling Degree Days with Base 65F': 'MonthlyDepartureFromNormalCoolingDegreeDays', 'Cooling Degree Days': 'CLDD', 'Number of Days with >= 1.0 inch of Snowfall': 'DSNW', 'Daily Heating Degree Days': 'DailyHeatingDegreeDays', 'Short Duration Precipitation Value 100': 'ShortDurationPrecipitationValue100', 'Monthly Total Snowfall': 'MonthlyTotalSnowfall', 'Short Duration End Date 020': 'ShortDurationEndDate020', 'Daily Snowfall': 'DailySnowfall', 'Daily Average Wet Bulb Temperature': 'DailyAverageWetBulbTemperature', 'Monthly Days With > 0.10 Precip': 'MonthlyDaysWithGT010Precip', 'Monthly Total Liquid Precipitation': 'MonthlyTotalLiquidPrecipitation', 'Monthly Greatest Snow Depth Date': 'MonthlyGreatestSnowDepthDate', 'Daily Minimum Dry Bulb Temperature': 'DailyMinimumDryBulbTemperature', 'Short Duration End Date 015': 'ShortDurationEndDate015', 'Hourly Wind Direction': 'HourlyWindDirection', 'Daily Weather': 'DailyWeather', 'Daily Average Station Pressure': 'DailyAverageStationPressure', 'Daily Average Dry Bulb Temperature': 'DailyAverageDryBulbTemperature', 'Short Duration End Date 010': 'ShortDurationEndDate010', 'Monthly Greatest Precip': 'MonthlyGreatestPrecip', 'Daily Average Dew Point Temperature': 'DailyAverageDewPointTemperature', 'Short Duration End Date 080': 'ShortDurationEndDate080', 'Monthly Min Sea Level Pressure Value Time': 'MonthlyMinSeaLevelPressureValueTime', 'Long-term Averages of Monthly Heating Degree Days with Base 65F': 'NormalsHeatingDegreeDay', 'Hourly Dry Bulb Temperature': 'HourlyDryBulbTemperature', 'Hourly Dew Point Temperature': 'HourlyDewPointTemperature', 'Sunset': 'Sunset', 'Monthly Min Sea Level Pressure Value Date': 'MonthlyMinSeaLevelPressureValueDate', 'Hourly Visibility': 'HourlyVisibility', 'Hourly Wind Gust Speed': 'HourlyWindGustSpeed', 'Hourly Station Pressure': 'HourlyStationPressure', 'Short Duration End Date 005': 'ShortDurationEndDate005', 'Daily Cooling Degree Days': 'DailyCoolingDegreeDays', 'Hourly Sky Conditions': 'HourlySkyConditions', 'Daily Sustained Wind Direction': 'DailySustainedWindDirection', 'Monthly Mean Temperature': 'MonthlyMeanTemperature', 'Short Duration End Date 120': 'ShortDurationEndDate120', 'Short Duration Precipitation Value 045': 'ShortDurationPrecipitationValue045'}}

Final Remarks

This tutorial doesn't cover all functionality and ways of using the noaa_ncei module. Please refer to the module source code and the API reference for more information on arguments not covered here.