We need a way to retrieve the data from the API. To achieve this, we first import Python's requests library, which allows us to make HTTP requests. Additionally, we import the pprint function from the pprint library to make the output more readable. Finally, we define the API endpoint we'll be interacting with by setting API_ENDPOINT to https://api.water.noaa.gov/hefs/. Here's how we do it:
# start by importing the requests library
import requests
# import the pprint library to make the output more readable
from pprint import pprint
# define the api-endpoint
API_ENDPOINT = "https://api.water.noaa.gov/hefs"
LOCATION_ID = "PGRC2"
PARAMETER_ID = "QINE"
In other cases, you might want to retrieve and plot recent ensemble data. The API allows ordering by a certain value of a parameter. This is achieved by adding ordering=
to a parameter name.
For example, to retrieve the latest QINE ensemble forecast for location id PGRC2, you'll specify location_id=PGRC2
, parameter_id=QINE
, ordering=-start_date_date
, and limit=1
. Note: the negative sign in front of start_date_date tells the API to sort the results by start_date_date descending. The general uri for this would be:
/v1/headers/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&ordering=-start_date_date&limit=1
Where LOCATION_ID and PARAMETER_ID are the location id and parameter id of the request respectively.
from datetime import datetime, timedelta
# create a series request with location_id, parameter_id, and ordering filters
uri = API_ENDPOINT + f"/v1/headers/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&ordering=-start_date_date&limit=1"
# get the response
response = requests.request("GET", uri)
# get most recent start date
startDate = response.json()['results'][0]['start_date_date']
# print the response
pprint(response.json())
Now that we have the latest start_date_date, we can retrieve all of the headers for that date. Once we have all of the headers, we can get the total number of ensemble members.
For example, to retrieve the newest data series headers, you can format the URI like this:
/v1/headers/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&forecast_date_date={startDate}
Where the startDate is the value of the variable found in the previous section. With the response from the URI above we can get the total number of ensemble members from the count field.
from datetime import datetime, timedelta
# create a series request with location_id, parameter_id, and start_date_date filters
uri = API_ENDPOINT + f"/v1/headers/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&forecast_date_date={startDate}"
# get the response
response = requests.request("GET", uri)
# grab total number of ensemble members
count = response.json()["count"]
# print the response
pprint(response.json())
Next, with the most recent forecast startDate and total number of ensemble members, we will retrieve all of the ensemble data for the given start_date_date.
We will use the term limit
to set the total series output to the count set above. To retrieve the ensemble members, you can format the URI like this:
/v1/ensembles/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&forecast_date_date={startDate}&limit={count}
Where count is the total number of ensemble members
# create a series request with location_id, parameter_id, start_date_date, and limit filters
uri = API_ENDPOINT + f"/v1/ensembles/?location_id={LOCATION_ID}¶meter_id={PARAMETER_ID}&start_date_date={startDate}&limit={count}"
# get the response
response = requests.request("GET", uri)
# print the response
pprint(response.json())
We now use the data above to plot the ensemble forecast. To do this, we must parse through the response data and plot each ensemble member's data points (time/value pairs) onto a single graph. We will use the matplotlib python package to plot the data.
import matplotlib.pyplot as plt
import datetime
def graph_ensemble_data(response_data):
"""Graphs ensemble data from a API response.
Args:
response_data: A dictionary containing the API response data.
"""
# check if results is present the response data
if 'results' in response_data and response_data['results']:
# iterate through result in results
for result in response_data['results']:
# check if event is present in response data
if 'events' in result and result['events']:
# iterate through event forcasts in the ensemble api response
for event in result['events']:
# initialize times list
times = []
# initialize values list
values = []
# iterate through values in the sepecific forcast event
for value in result['events']:
# prepare datetime for reformatting
datetime_str = f"{value['date']} {value['time']}"
# format datetime to Year-Month-Day Hour:Minutes:Seconds
datetime_obj = datetime.datetime.strptime(datetime_str, "%Y-%m-%d %H:%M:%S")
# add values from the forecast to their respective lists
times.append(datetime_obj)
values.append(float(value['value']))
# plot the date for the event forecast
#print("now")
plt.plot(times, values)
# set x graph label
plt.xlabel('Time')
# set y graph label
plt.ylabel('Value')
# set graph title
plt.title('Ensemble Data')
# rotate x text vertically for better visibility
plt.xticks(rotation='vertical')
# display the actaul plot
plt.show()
# call graphing function
graph_ensemble_data(response.json())
In this notebook, we learned how to use the HEFS API to retrieve the latest ensemble forecast for a given location. We also learned how to use the API to filter results based on specific fields like location_id
, parameter_id
, and many more. The API allows filtering on all available fields by appending the field name and value to the URI as query parameters.
These techniques can be used to retrieve, filter, and paginate data for all HEFS ensemble forecasts.