Created by Ethan C. Campbell for NCAT/MATE/GO-BGC Marine Technology Summer Program
Thursday, August 24, 2023
import numpy as np # NumPy is an array and math library
import matplotlib.pyplot as plt # Matplotlib is a visualization (plotting) library
import pandas as pd # Pandas lets us work with spreadsheet (.csv) data
from datetime import datetime, timedelta # Datetime helps us work with dates and times
First, let's download three .csv
data files from Google Drive here: https://drive.google.com/drive/folders/1Am6XdlB-APQ3ccOvLeGK8DFPQ2OnPeJD?usp=sharing. Two of the files are CTD casts that were collected from the R/V Rachel Carson off of Carkeek Park near Seattle. *Save these two files to your computer.*
Let's first take a look at the raw files, including their headers.
Next, we can upload the files to this Google Colab notebook. *Click the sidebar folder icon on the left, then use the page-with-arrow icon at the top to select the files and upload them.* NOTE: uploaded files will be deleted from Google Colab when you refresh this notebook!
We will specify each filepath using string variables:
filepath_0 = '/content/2023051001001_Carkeek.csv'
filepath_1 = '/content/2023051101001_Carkeek.csv'
Now, we can load the files using pandas
:
pd.read_csv(FILEPATH, ARGUMENTS...)
This function is very customizable using the many optional ARGUMENTS
, which allow it to handle almost any file. You can find documentation about the arguments at this link.
*Let's first take a look at the data file using a simple text editor. Notice the long header. What argument can we use to exclude the header from being loaded?*
Below, we'll load each data file using pd.read_csv()
and store each file into a new variable.
We can look at the data using display()
(which is a fancy version of print()
for DataFrames):
data_0 = pd.read_csv(filepath_0,comment='#')
data_1 = pd.read_csv(filepath_1,comment='#')
display(data_0)
The data in a pandas
DataFrame is similar to a NumPy 2-D array, except we use column labels to refer to columns and index values to refer to rows.
To retrieve a specific column, we use bracket notation: data_frame[COLUMN_LABEL]
.
# For example:
data_0['density00']
With these tools, we can make line plots of temperature vs. depth that include both CTD casts.
*Can you try plotting another parameter vs. depth? Note: the file contains salinity, oxygen, fluorescence, and pH data.*
The following line of code flips the y-axis so the surface is at the top: plt.gca().invert_yaxis()
.
# Temperature vs. depth profile
plt.plot(data_0['t090C'],data_0['depSM'],label='R/V Carson cast #1 (5/10/23)')
plt.plot(data_1['t090C'],data_1['depSM'],label='R/V Carson cast #2 (5/11/23)')
plt.legend()
plt.gca().invert_yaxis() # This reverses the y-axis
plt.xlabel('Temperature (°C)')
plt.ylabel('Depth (m)');
# Write your code here:
*How do the casts look similar and how do they look different? What could be some causes of the differences?*
*What do you observe in the other parameter(s) that you plotted?*
At this link is a file with a seaglider (or just "glider") profile from the same cruise, one day after the second R/V Carson cast.
If you haven't already, *download* the glider .csv
data file from Google Drive here: https://drive.google.com/drive/folders/1Am6XdlB-APQ3ccOvLeGK8DFPQ2OnPeJD?usp=sharing. *Save the file to your computer, then load it into Google Colab.*
Let's load and display the data using pandas
:
filepath_2 = '/content/20230512_glider.csv'
data_2 = pd.read_csv(filepath_2,parse_dates=['time'])
display(data_2)
*What data parameters do you see, and what do you expect each parameter represents?*
As a first step, let's see whether the glider was sampling the same region as the R/V Rachel Carson. We can plot each of their tracks using latitude and longitude:
plt.plot(data_0['longitude'],data_0['latitude'],label='R/V Rachel Carson cast #1')
plt.plot(data_1['longitude'],data_1['latitude'],label='R/V Rachel Carson cast #2')
plt.plot(data_2['longitude'],data_2['latitude'],label='Glider')
plt.legend()
plt.xlabel('Longitude (°E)')
plt.ylabel('Latitude (°N)')
plt.title('Ship and glider tracks');
One way to visualize glider data is in time-depth space. In other words, time is on the x-axis and depth is on the y-axis. If we use a scatter plot (plt.scatter()
), we can color the points by another quantity, like temperature or buoyancy.
*Try changing the c
(color) argument below to plot different quantities in time-depth space. What do you notice?*
plt.scatter(data_2['time'],data_2['depth'],c=data_2['buoyancy'])
plt.colorbar(label='Buoyancy (g)') # This adds the color bar and color label on the right
plt.gca().invert_yaxis() # This reverses the y-axis
plt.xlabel('Time')
plt.ylabel('Depth (m)');
Of course, we can also plot parameter vs. depth profiles.
*Copy the code from above, where you plotted temperature profiles from the R/V Carson ship CTD casts. Then add the glider temperature data.*
# Write your code here:
We can zoom in to certain depths by changing the y-axis scale using:
plt.ylim([LOWER,UPPER])
*By plt.ylim()
to the plot above, what can we observe about the differences between the casts near the surface and near the bottom? Why do these differences exist?*
*If we wanted to calibrate a glider sensor to the R/V Carson's sensor, what depths would we want to use?*
Links: