Written by Sage Lichtenwalner, Rutgers University, May 31, 2019
The example was developed for the June 2019 OOI Ocean Data Labs Workshop
In this Python notebook, we will demonstrate some advanced techniques for working with data from the Ocean Observatories Initiative (OOI).
This example was designed to run on Google's Colaboratory platform, though it should also work on any Jupyter notebook platform, assuming the required libraries are installed.
In this notebook, we will demonstrate the following Data Discovery steps: 5. Quick Plots 6. Basic Statistics and Analysis
We will continue to use data from the 30m Dissolved Oxygen sensor on the Global Irminger Sea Flanking Mooring A, also known as GI03FLMA-RIS01-03-DOSTAD000.
As in the first example, the first thing we need to do is load the Python libraries we will need to load, process and plot our data.
import xarray as xr
!pip install netcdf4==1.5.0
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
Here is the URL to the datafile we will use.
single_file = 'https://opendap.oceanobservatories.org/thredds/dodsC/ooi/sage-marine-rutgers/20190307T155319-GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered/deployment0001_GI03FLMA-RIS01-03-DOSTAD000-recovered_host-dosta_abcdjm_sio_instrument_recovered_20140912T201501-20150818T103001.nc'
And now we can load the file.
# Load the data files
ds = xr.open_dataset(single_file)
ds = ds.swap_dims({'obs': 'time'}) #Swap dimensions
df = ds.to_dataframe() #And convert to a Pandas DataFrame
print('Dataset has %d points' % df.index.size)
# Add your code here to display the first few rows of the dataframe
In the first activity, we used the "built-in" plotting features of Pandas to quickly generate plots. For example...
df.ctdmo_seawater_temperature.plot();
But what if we want to change what is plotted on each axes? For example if we wanted to plot temperature vs. pressure, or salinity?
To do this, we need to use Matplotlib's plotting functions explicitly, which allows us to specify both the x and y axes.
# Plot Temperature vs. Pressure
plt.plot(df.ctdmo_seawater_temperature,df.int_ctd_pressure, linestyle='',marker='.');
# Flip the y-axis
ax = plt.gca()
ax.invert_yaxis()
# Label the Plot
plt.ylabel('Pressure')
plt.xlabel('Temperature');
plt.title('Pioneer Central Inshore Profiler');
We can also use the scatter function, which is sometimes easier to use.
Let's demonstrate this by creating a TS diagram.
# TS Diagram
plt.scatter(df.practical_salinity, df.ctdmo_seawater_temperature, s=5);
# Label the Plot
plt.xlabel('Salinity')
plt.ylabel('Temperature');
plt.title('Irminger Sea Flanking Mooring A 30m CTD');
# Add your code here to plot temperature vs. DO
In the TS diagram above, there is some spurious data. My guess is that the line of data heading off to the left is from when the instrument was recovered.
Let's subset the data so we can remove that line, and additionally focus in on a narrower time period.
df2 = df.loc['2014-10-01':'2014-10-31']
df2.ctdmo_seawater_temperature.plot();
Next we'll make a more advanced TS diagram, coloring each dot in time.
plt.scatter(df2.practical_salinity,df2.ctdmo_seawater_temperature,s=5,c=df2.index, cmap='viridis')
plt.xlabel('Salinity')
plt.ylabel('Temperature')
# Quick title from the file
plt.title(ds.source);
# Add a colorbar
cbar = plt.colorbar(label='Time');
# Fix the colorbar ticks
import pandas as pd # We need pandas for this
cbar.ax.set_yticklabels(pd.to_datetime(cbar.get_ticks()).strftime(date_format='%Y-%m-%d'));
We used the default python colorbar for this plot (viridis), but there is a much larger colormap collection available.
# Prepare to be blown away...
df.describe()
There's a lot of variables included here, let's trim down the DataFrame to just include the variables we really want.
df[['ctdmo_seawater_temperature','dissolved_oxygen']].describe()
You can also pull out a signle statistic for a single variable, using the max, mean, std and quantile functions. When in doubt, try tab-complete.
# Add your code here to show statistics for individual variables
We can also easily calculate hourly, daily and monthly averages.
See the pandas.resample doc for more, as well as this list of offset options.
That said, if you want to use centered averaging, moving averages, or other more complicated averaging or filtering routines using irregular intervals, you might have to roll-your-own code.
Here's a quick example... notice the legend labels we've also added.
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)
df['ctdmo_seawater_temperature'].plot(ax=ax,label='Raw',linestyle='None',marker='.',markersize=2)
df['ctdmo_seawater_temperature'].resample('D').mean().plot(ax=ax,label='Daily')
df['ctdmo_seawater_temperature'].resample('5D').mean().plot(ax=ax,label='5 Day')
df['ctdmo_seawater_temperature'].resample('MS').mean().plot(ax=ax,label='Monthly',marker='d') #MS=Month Start
plt.legend();
We can adapt this code, and combine it with the export code from Activity 1 to create a downloadable CSV file.
df[['ctdmo_seawater_temperature','practical_salinity','dissolved_oxygen']].resample('D').mean().to_csv('ctd_daily_average.csv')
If we don't want to include all of the variables in our DataFrame when we first generate it, we can specify just the ones we want. This makes the dataset much smaller to work with and export. Notice the double brackets [['var1','var2']]
are needed when specifying a list.
# Convert to DataFrame
df = ds[['ctdmo_seawater_temperature','practical_salinity','dissolved_oxygen']].to_dataframe()
# Drop unnecessary columns
df = df.drop(columns=['obs','lon','lat'])
df.head()
# We can also subset the data to a specific time range
import datetime
df = df.loc[datetime.date(2014,10,1):datetime.date(2014,11,1)]
df.head()
Well, those are all the basics. You can check out Activity 3 to explore some additional datasets, and you can check out the Profile Examples notebook to learn how to load and plot datasets from gliders and profilers.
To continue the fun of playing with OOI data in python, I also recommend checking out these examples:
I'm also working on a number of new examples for the Ocean Data Labs blog. Here are the first few...
Welcome to the OOI Data World. Have fun exploring the deep!