Developed by: Georgii Bocharov
E-Mail: bocharovgeorgii@gmail.com
coastlib: https://github.com/georgebv/coastlib
License: GNU General Public License v3.0
This notebook provides examples for the noaa_ncei
module of the coastlib library. Source code for this module can be found here.
The noaa_ncei
module is a part of the coastlib.data
package. This module provides interface to the NOAA NCEI data portal via the NCEI Data Service API. It allows retrieval of environmental data collected by NOAA NCEI sensors such as wind, precipitation, air pressure, etc. in the form of pandas DataFrame. With the help of this tool one can automate extraction of large amounts of data from NOAA NCEI stations for further processing and storing.
Let's start by extracting hourly wind speed data for the La Guardia and Bethel Airport stations for the October 2012 time period:
%matplotlib inline
import pandas as pd
from coastlib.data import ncei_api, ncei_api_batch, ncei_datasets, ncei_search
df = ncei_api(
dataset='local-climatological-data', stations=['72503014732', '70219026615'],
start_date='2012-10-01', end_date='2012-11-01', datatypes=['HourlyWindSpeed', 'HourlyWindDirection']
)
df
REPORT_TYPE | HourlyWindDirection | LONGITUDE | ELEVATION | SOURCE | HourlyWindSpeed | LATITUDE | ||
---|---|---|---|---|---|---|---|---|
STATION | DATE | |||||||
72503014732 : LA GUARDIA AIRPORT, NY US | 2012-10-01 00:51:00 | FM-15 | 270 | -73.88 | 3.4 | 7 | 8 | 40.7792 |
2012-10-01 01:00:00 | FM-12 | 270 | -73.88 | 3.4 | 4 | 8 | 40.7792 | |
2012-10-01 01:51:00 | FM-15 | 290 | -73.88 | 3.4 | 7 | 13 | 40.7792 | |
2012-10-01 02:51:00 | FM-15 | 260 | -73.88 | 3.4 | 7 | 11 | 40.7792 | |
2012-10-01 03:51:00 | FM-15 | 250 | -73.88 | 3.4 | 7 | 10 | 40.7792 | |
... | ... | ... | ... | ... | ... | ... | ... | ... |
70219026615 : BETHEL AIRPORT, AK US | 2012-11-01 21:00:00 | FM-12 | 010 | -161.8293 | 31.1 | 4 | 19 | 60.785 |
2012-11-01 21:53:00 | FM-15 | 010 | -161.8293 | 31.1 | 7 | 21 | 60.785 | |
2012-11-01 22:53:00 | FM-15 | 010 | -161.8293 | 31.1 | 7 | 25 | 60.785 | |
2012-11-01 23:53:00 | FM-15 | 010 | -161.8293 | 31.1 | 7 | 24 | 60.785 | |
2012-11-01 23:59:00 | SOD | NaN | -161.8293 | 31.1 | 6 | NaN | 60.785 |
2427 rows × 7 columns
It should be noted that the extracted data is not numeric. This was done explicitly in order to allow for extraction of all data provided by NOAA (they mix datatypes in the responses) with specific processing methods left to the end user.
In order to do something meaningful with the data, some processing is required:
df_processed = (
df
.loc[:, ['HourlyWindSpeed', 'HourlyWindDirection']]
.apply(pd.to_numeric, errors='coerce')
.dropna(how='any')
)
df_processed
HourlyWindSpeed | HourlyWindDirection | ||
---|---|---|---|
STATION | DATE | ||
72503014732 : LA GUARDIA AIRPORT, NY US | 2012-10-01 00:51:00 | 8.0 | 270.0 |
2012-10-01 01:00:00 | 8.0 | 270.0 | |
2012-10-01 01:51:00 | 13.0 | 290.0 | |
2012-10-01 02:51:00 | 11.0 | 260.0 | |
2012-10-01 03:51:00 | 10.0 | 250.0 | |
... | ... | ... | ... |
70219026615 : BETHEL AIRPORT, AK US | 2012-11-01 20:53:00 | 20.0 | 10.0 |
2012-11-01 21:00:00 | 19.0 | 10.0 | |
2012-11-01 21:53:00 | 21.0 | 10.0 | |
2012-11-01 22:53:00 | 25.0 | 10.0 | |
2012-11-01 23:53:00 | 24.0 | 10.0 |
2329 rows × 2 columns
df_processed.loc['72503014732 : LA GUARDIA AIRPORT, NY US', 'HourlyWindSpeed'].plot(color='k', lw=.5)
<matplotlib.axes._subplots.AxesSubplot at 0x1a323ecd488>
df_processed.loc['70219026615 : BETHEL AIRPORT, AK US', 'HourlyWindSpeed'].plot(
kind='hist', facecolor='None', edgecolor='k', lw=1, bins=15
)
<matplotlib.axes._subplots.AxesSubplot at 0x1a323fe5e08>
The ncei_api
is the most direct way of interfacing with the NOAA NCEI API. Its drawback is that it allows for extraction of data for only a relatively small window (depends on data source, not clearly documented by NOAA - largest window apeears to be ~10 years). Serious applications require many years of data. That's where the ncei_api_batch
function becomes much more convenient.
Let's get 2 years of wind data for the La Guardia station (it should be noted that station ID is not WBAN, but a USAF number - it can be found in pdf printouts for the station or via the NCEI Search API as shown further in this notebook):
wind = ncei_api_batch(
dataset='local-climatological-data', stations='72503014732',
start_date='2010-01-01', end_date='2011-01-01', time_delta='1Y',
datatypes=['HourlyWindSpeed', 'HourlyWindDirection']
)
wind
REPORT_TYPE | HourlyWindDirection | LONGITUDE | ELEVATION | SOURCE | HourlyWindSpeed | LATITUDE | ||
---|---|---|---|---|---|---|---|---|
STATION | DATE | |||||||
72503014732 : LA GUARDIA AIRPORT, NY US | 2010-01-01 00:51:00 | FM-15 | 060 | -73.88 | 3.4 | 7 | 3 | 40.7792 |
2010-01-01 01:00:00 | FM-12 | 060 | -73.88 | 3.4 | 4 | 3 | 40.7792 | |
2010-01-01 01:04:00 | FM-16 | 000 | -73.88 | 3.4 | 7 | 0 | 40.7792 | |
2010-01-01 01:37:00 | FM-16 | 000 | -73.88 | 3.4 | 7 | 0 | 40.7792 | |
2010-01-01 01:47:00 | FM-16 | 000 | -73.88 | 3.4 | 7 | 0 | 40.7792 | |
... | ... | ... | ... | ... | ... | ... | ... | |
2011-01-01 01:51:00 | FM-15 | 200 | -73.88 | 3.4 | 7 | 8 | 40.7792 | |
2011-01-01 02:51:00 | FM-15 | 200 | -73.88 | 3.4 | 7 | 6 | 40.7792 | |
2011-01-01 03:51:00 | FM-15 | 200 | -73.88 | 3.4 | 7 | 5 | 40.7792 | |
2011-01-01 04:00:00 | FM-12 | 200 | -73.88 | 3.4 | 4 | 5 | 40.7792 | |
2011-01-01 04:51:00 | FM-15 | 180 | -73.88 | 3.4 | 7 | 3 | 40.7792 |
13482 rows × 7 columns
This output is exactly the same one as for the ncei_api
function, except that the DataFrame is much longer. Similar processing is required:
wind_processed = (
wind
.loc[:, ['HourlyWindSpeed', 'HourlyWindDirection']]
.apply(pd.to_numeric, errors='coerce')
.dropna(how='any')
)
wind_processed
HourlyWindSpeed | HourlyWindDirection | ||
---|---|---|---|
STATION | DATE | ||
72503014732 : LA GUARDIA AIRPORT, NY US | 2010-01-01 00:51:00 | 3.0 | 60.0 |
2010-01-01 01:00:00 | 3.0 | 60.0 | |
2010-01-01 01:04:00 | 0.0 | 0.0 | |
2010-01-01 01:37:00 | 0.0 | 0.0 | |
2010-01-01 01:47:00 | 0.0 | 0.0 | |
... | ... | ... | |
2011-01-01 01:51:00 | 8.0 | 200.0 | |
2011-01-01 02:51:00 | 6.0 | 200.0 | |
2011-01-01 03:51:00 | 5.0 | 200.0 | |
2011-01-01 04:00:00 | 5.0 | 200.0 | |
2011-01-01 04:51:00 | 3.0 | 180.0 |
12846 rows × 2 columns
wind_processed.loc[:, 'HourlyWindSpeed'].plot(kind='hist', facecolor='None', edgecolor='k', lw=1, bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x1a3259f5508>
The coastlib.data.noaa_ncei
module has additional functions ncei_datasets
and ncei_search
. These functions are useful when exploring what information is available for specific regions and what keywords should be used in the ncei_api
and ncei_api_batch
functions.
The ncei_datasets
function should be used to retrieve available NOAA NCEI formats, datasets, and datatypes for given dates, geographical extents, and keywords. Unfortunately, these appear to not be working properly at this moment since the NCEI API is not fully developed and is not properly documented. One way to use it would be as follows:
datasets = ncei_datasets()
print('Formats:', datasets[0], end='\n\n')
print('Datasets (first 3):', datasets[1][:3], end='\n\n')
print('Datatypes:', datasets[2].keys())
Formats: ['csv', 'json', 'native', 'netcdf', 'pdf'] Datasets (first 3): ['avhrr-cloud-properties-patmosx', 'avhrr-reflectance-patmosx', 'daily-summaries'] Datatypes: dict_keys(['global-summary-of-the-month', 'global-summary-of-the-year', 'global-hourly', 'daily-summaries', 'global-marine', 'global-summary-of-the-day', 'mean-layer-temperature-uah', 'avhrr-reflectance-patmosx', 'local-climatological-data', 'normals-annualseasonal'])
The ncei_search
function should be used to retrieve available NOAA NCEI formats, datasets, and datatypes for given parameters (station IDs, geographical extents, keywords, etc.). Unfortunately, as with the ncei_datasets
, the NCEI Search API is not documented (current NOAA documentation is completely wrong - their description don't match examples, I had to manually play with different parameters to find out how it works). In addition to absense of any documentation, this API is also broken - certain options have no effect (such as geographical extent). It is unknown when/if NOAA will update it.
So far, the only meaningful way to use this is to get metadata for known stations:
ns_data = ncei_search(dataset='local-climatological-data', stations='72503014732', limit=5)
print('Datatypes (first 5):', ns_data[0][:5], end='\n\n')
print('Stations:', ns_data[1], end='\n\n')
print('Results:', ns_data[2]['72503014732'])
Datatypes (first 5): ['AWND', 'CDSD', 'CLDD', 'DSNW', 'DailyAverageDewPointTemperature'] Stations: ['72503014732'] Results: {'name': 'LA GUARDIA AIRPORT, NY US', 'datatypes': {'Short Duration Precipitation Value 150': 'ShortDurationPrecipitationValue150', 'REPORT_TYPE': 'REPORT_TYPE', 'Short Duration Precipitation Value 030': 'ShortDurationPrecipitationValue030', 'Monthly Sea Level Pressure': 'MonthlySeaLevelPressure', 'Monthly Max Sea Level Pressure Value Date': 'MonthlyMaxSeaLevelPressureValueDate', 'Hourly Sea Level Pressure': 'HourlySeaLevelPressure', 'Daily Maximum Dry Bulb Temperature': 'DailyMaximumDryBulbTemperature', 'Monthly Days With Temperature > 32 Degrees': 'MonthlyDaysWithGT32Temp', 'Monthly Days With Temperature < 32 Degrees': 'MonthlyDaysWithLT32Temp', 'Monthly Max Sea Level Pressure Value Time': 'MonthlyMaxSeaLevelPressureValueTime', 'Monthly Greatest Precip Date': 'MonthlyGreatestPrecipDate', 'Hourly Precipitation': 'HourlyPrecipitation', 'Daily Peak Wind Speed': 'DailyPeakWindSpeed', 'Monthly Station Pressure': 'MonthlyStationPressure', 'Daily Peak Wind Direction': 'DailyPeakWindDirection', 'Short Duration End Date 100': 'ShortDurationEndDate100', 'Monthly Departure From Normal Maximum Temperature': 'MonthlyDepartureFromNormalMaximumTemperature', 'Daily Average Relative Humidity': 'DailyAverageRelativeHumidity', 'Monthly Maximum Temperature': 'MonthlyMaximumTemperature', 'SOURCE': 'SOURCE', 'Hourly Pressure Tendency': 'HourlyPressureTendency', 'Short Duration End Date 060': 'ShortDurationEndDate060', 'Short Duration End Date 180': 'ShortDurationEndDate180', 'Short Duration Precipitation Value 020': 'ShortDurationPrecipitationValue020', 'Long-term Averages of Monthly Cooling Degree Days with Base 65F': 'NormalsCoolingDegreeDay', 'Monthly Departure From Normal Minimum Temperature': 'MonthlyDepartureFromNormalMinimumTemperature', 'Monthly Greatest Snowfall Date': 'MonthlyGreatestSnowfallDate', 'Hourly Pressure Change': 'HourlyPressureChange', 'Daily Sustained Wind Speed': 'DailySustainedWindSpeed', 'Short Duration Precipitation Value 015': 'ShortDurationPrecipitationValue015', 'Monthly Days With > 0.001 Precip': 'MonthlyDaysWithGT001Precip', 'Short Duration Precipitation Value 010': 'ShortDurationPrecipitationValue010', 'Short Duration Precipitation Value 080': 'ShortDurationPrecipitationValue080', 'Daily Snow Depth': 'DailySnowDepth', 'Monthly Days With Temperature > 90 Degrees': 'MonthlyDaysWithGT90Temp', 'Daily Average Wind Speed': 'DailyAverageWindSpeed', 'Hourly Altimeter Setting': 'HourlyAltimeterSetting', 'Heating Degree Days Season to Date': 'HDSD', 'Cooling Degree Days Season to Date': 'CDSD', 'Average Wind Speed for the Month': 'AWND', 'Short Duration End Date 045': 'ShortDurationEndDate045', 'Monthly Departure From Normal Average Temperature': 'MonthlyDepartureFromNormalAverageTemperature', 'Short Duration Precipitation Value 005': 'ShortDurationPrecipitationValue005', 'Sunrise': 'Sunrise', 'REM': 'REM', 'Monthly Departure From Normal Precipitation': 'MonthlyDepartureFromNormalPrecipitation', 'Departure from Long-term Averages of Monthly Heating Degree Days with Base 65F': 'MonthlyDepartureFromNormalHeatingDegreeDays', 'Monthly Max Sea Level Pressure Value': 'MonthlyMaxSeaLevelPressureValue', 'Short Duration Precipitation Value 120': 'ShortDurationPrecipitationValue120', 'Monthly Greatest Snow Depth': 'MonthlyGreatestSnowDepth', 'Monthly Minimum Temperature': 'MonthlyMinimumTemperature', 'Daily Precipitation': 'DailyPrecipitation', 'Daily Average Sea Level Pressure': 'DailyAverageSeaLevelPressure', 'Hourly Wind Speed': 'HourlyWindSpeed', 'Monthly Days With Temperature < 0 Degrees': 'MonthlyDaysWithLT0Temp', 'Hourly Relative Humidity': 'HourlyRelativeHumidity', 'Short Duration End Date 150': 'ShortDurationEndDate150', 'Hourly Present Weather Type': 'HourlyPresentWeatherType', 'Monthly Min Sea Level Pressure Value': 'MonthlyMinSeaLevelPressureValue', 'Daily Departure From Normal Average Temperature': 'DailyDepartureFromNormalAverageTemperature', 'Short Duration End Date 030': 'ShortDurationEndDate030', 'Short Duration Precipitation Value 060': 'ShortDurationPrecipitationValue060', 'Monthly Greatest Snowfall': 'MonthlyGreatestSnowfall', 'Short Duration Precipitation Value 180': 'ShortDurationPrecipitationValue180', 'Heating Degree Days': 'HTDD', 'Hourly Wet Bulb Temperature': 'HourlyWetBulbTemperature', 'Departure from Long-term Averages of Monthly Cooling Degree Days with Base 65F': 'MonthlyDepartureFromNormalCoolingDegreeDays', 'Cooling Degree Days': 'CLDD', 'Number of Days with >= 1.0 inch of Snowfall': 'DSNW', 'Daily Heating Degree Days': 'DailyHeatingDegreeDays', 'Short Duration Precipitation Value 100': 'ShortDurationPrecipitationValue100', 'Monthly Total Snowfall': 'MonthlyTotalSnowfall', 'Short Duration End Date 020': 'ShortDurationEndDate020', 'Daily Snowfall': 'DailySnowfall', 'Daily Average Wet Bulb Temperature': 'DailyAverageWetBulbTemperature', 'Monthly Days With > 0.10 Precip': 'MonthlyDaysWithGT010Precip', 'Monthly Total Liquid Precipitation': 'MonthlyTotalLiquidPrecipitation', 'Monthly Greatest Snow Depth Date': 'MonthlyGreatestSnowDepthDate', 'Daily Minimum Dry Bulb Temperature': 'DailyMinimumDryBulbTemperature', 'Short Duration End Date 015': 'ShortDurationEndDate015', 'Hourly Wind Direction': 'HourlyWindDirection', 'Daily Weather': 'DailyWeather', 'Daily Average Station Pressure': 'DailyAverageStationPressure', 'Daily Average Dry Bulb Temperature': 'DailyAverageDryBulbTemperature', 'Short Duration End Date 010': 'ShortDurationEndDate010', 'Monthly Greatest Precip': 'MonthlyGreatestPrecip', 'Daily Average Dew Point Temperature': 'DailyAverageDewPointTemperature', 'Short Duration End Date 080': 'ShortDurationEndDate080', 'Monthly Min Sea Level Pressure Value Time': 'MonthlyMinSeaLevelPressureValueTime', 'Long-term Averages of Monthly Heating Degree Days with Base 65F': 'NormalsHeatingDegreeDay', 'Hourly Dry Bulb Temperature': 'HourlyDryBulbTemperature', 'Hourly Dew Point Temperature': 'HourlyDewPointTemperature', 'Sunset': 'Sunset', 'Monthly Min Sea Level Pressure Value Date': 'MonthlyMinSeaLevelPressureValueDate', 'Hourly Visibility': 'HourlyVisibility', 'Hourly Wind Gust Speed': 'HourlyWindGustSpeed', 'Hourly Station Pressure': 'HourlyStationPressure', 'Short Duration End Date 005': 'ShortDurationEndDate005', 'Daily Cooling Degree Days': 'DailyCoolingDegreeDays', 'Hourly Sky Conditions': 'HourlySkyConditions', 'Daily Sustained Wind Direction': 'DailySustainedWindDirection', 'Monthly Mean Temperature': 'MonthlyMeanTemperature', 'Short Duration End Date 120': 'ShortDurationEndDate120', 'Short Duration Precipitation Value 045': 'ShortDurationPrecipitationValue045'}}
This tutorial doesn't cover all functionality and ways of using the noaa_ncei
module. Please refer to the module source code and the API reference for more information on arguments not covered here.