Due: Saturday, November 14th at 11:59 pm PT
Objective: This assignment will give you experience loading CSV and netCDF data files using NumPy, Pandas, and xarray, and plotting data using Matplotlib.
Instructions:
Honor code: In the space below, you can acknowledge and describe any assistance you've received on this assignment, whether that was from an instructor, classmate (either directly or on Piazza), and/or online resources other than official Python documentation websites like docs.python.org or numpy.org. Alternatively, if you prefer, you may acknowledge assistance at the relevant point(s) in your code using a Python comment (#). You do not have to acknowledge OCEAN 215 class or lesson resources.
Acknowledge assistance here:
Data is imperfect, and reading it in Python can require a lot of configuration to make sure that your data is being treated properly. Fortunately, the functions we use to read data all have optional arguments that can help us get exactly what we want from our files. These arguments are well-documented and available at each package's API website. Use the linked APIs for each of the three file-reading functions we have learned about in class to understand how to use certain arguments.
np.genfromtxt
(API linked here).a. What argument name allows you to skip lines at the beginning of the file that are not the data you want to read?
b. What argument name allows you to select only certain columns to load data from?
pd.read_csv()
(API linked here).a. What argument name allows you to handle an unusual string used as a placeholder for missing data?
b. What argument name tells Pandas to ignore lines or portions of lines beginning with a certain character?
xr.open_dataset()
(API linked here).a. What argument name can be used to exclude a variable from being loaded?
b. When using the
decode_times
argument, what object type is required as the input? What is the default value ofdecode_times
?
Write a print statement with the answer to each part.
Example: What is the argument that specifies the separation character used in a file?
# Example:
print('Part 0a: delimiter')
Part 0a: delimiter
# Your answers below:
Research cruises are invaluable data sources for oceanographers. During a cruise, measurements of the water column known as CTD casts are conducted. In each cast, data are collected about the seawater salinity (from conductivity), temperature, and pressure (or depth) (hence "CTD") as well as chemical properties such as dissolved oxygen concentrations and chlorophyll fluorescence.
Historically, hydrographic cruise programs such as CLIVAR, WOCE, and GO-SHIP have repeated measurements at approximately the same locations at intervals of a few years or decades. Repeating these cruises and casts enables us to see how the ocean changes over time!
In the Assignment #3 Google Drive folder, we have provided data files from two CTD casts taken during two different cruises along the same ship transect in the Southern Ocean (see figure). One cast is from 2008 and the other cast is from 2019.
Files:
Using these files, complete the following 4 tasks:
Using readline()
and for
loop(s), print the first 15 lines of each of the provided files.
Answer the following questions for both files using print statements (in this case, hard-coding numbers into your strings is acceptable):
a. How many header lines do these data files have, including the column labels and column units? (Note that column information is considered part of the header for
np.genfromtxt()
but not forpd.read_csv()
.)b. What columns are the pressure, temperature, and salinity measurements in?
c. What are the latitude and longitude coordinates for these casts?
d. What is the primary delimiter in these data files?
# Write your code below:
Using np.genfromtxt()
and the information you found for Part 2, load the pressure, temperature, and salinity data from these CTD casts.
Plot the data following these parameters:
a. Set up a figure with two subplots: a left and a right subplot.
b. On the left subplot, plot the temperature (x-axis) versus pressure (y-axis) for each of the casts. Use the same marker for both lines, but different colors for the lines. Add a legend on your plot denoting the different cast years.
c. On the right subplot, plot the salinity (x-axis) versus pressure (y-axis) for each of the casts. Use the same marker on both lines, but a different marker than the temperature markers. Have the line colors correspond to the colors used in the temperature plot. Add a legend on your plot denoting the different cast years.
d. Put grids on both plots and reverse the y-axis directions so that pressure is increasing downwards. Don't forget to properly label your plots!
# Write your code below:
Image: Research scientist Rick Rupan at University of Washington's Argo/SOCCOM float laboratory (credit: Earle Wilson).
Biogeochemical profiling floats, like those built at UW for the SOCCOM project, are deployed from a ship, then they drift with the currents for about 5 years while collecting measurements.
Every 7-10 days, each float sinks to a depth of 2000 m depth, then it measures physical parameters (like temperature and salinity) and biogeochemical parameters (like pH, oxygen, nitrate, and chlorophyll concentrations) as it ascends back to the surface to transmit its measurements by satellite. If a float is unable to transmit its data, its position must be estimated based on its last known position and the position at which it begins transmitting again.
In the Assignment #3 Google Drive folder, we have provided a CSV file (Southern_Ocean_float_9094_time_series.csv) containing near-surface measurements from SOCCOM float #9094, most from the upper 20 m of the ocean. The float drifted around the Weddell Sea, a region of the Southern Ocean offshore of Antarctica, from 2014 to 2019.
pd.read_csv()
to load the CSV data file from Google Drive. Use the parse_dates
and index_col
arguments to read values in the "Datetimes" column as datetime objects and set that column as the DataFrame index.# Write your code for Part 1 here:
a. Display the Pandas DataFrame containing the data you just loaded.
b. Display the DataFrame's summary statistics using
.describe()
.
# Write your code for Part 2 here:
a. How many parameters are provided in this data set, not including datetimes, latitude, and longitude?
b. The data counts show that three columns are missing data. What are those three columns?
c. What was the coldest temperature measured by the float? Round your answer to two decimal places and include units.
d. What is the standard deviation of dissolved oxygen in the data? Round your answer to one decimal place and include units.
# Write your print statements for Part 3 here:
plt.tight_layout()
. This adjusts the spacing between subplots to make them look nicer.a. On the left subplot, make a black line plot of time vs. chlorophyll-a concentration. On top of the line, add a scatter plot of time vs. chlorophyll-a, with the markers colored according to nitrate concentration. Add a colorbar and label it appropriately, including units. Add axis labels, a title, and grid lines.
b. On the right subplot, make a black line plot of the float's geographic track (i.e. longitude vs. latitude). On top of the line, add a scatter plot of the location points, with the markers colored according to temperature. Set the colormap to one of these options from Matplotlib, choosing a colormap that transitions from dark to light colors. Add a colorbar and label it appropriately, including units. Add axis labels, a title, and grid lines.
c. What does the left subplot reveal about the relationship between chlorophyll (a pigment found in phytoplankton) and nitrate concentration (a dissolved nutrient)? Provide your answer in a print statement.
d. The freezing point of seawater is about –2°C. Sea ice forms when the surface ocean is close to freezing. With this in mind, what does the right subplot reveal about the float's ability to transmit accurate lat/lon positions year-round? Provide your answer in a print statement.
# Write your code for Part 4 here:
# Set up the subplots canvas:
# Keep (and uncomment) this line of code:
# plt.tight_layout()
# Draw the two subplots:
# Write your print statements for Parts 4c and 4d here:
a. What was the salinity measurement on July 10, 2016? Use
.loc[]
selection to identify and print this in 1-2 lines of code. Round your answer to one decimal place, and include units.b. What was the highest chlorophyll-a concentration in 2018? Use
.loc[]
and slicing to calculate and print this in 1-2 lines of code. Round your answer to one decimal place, and include units. You may want to check that this makes sense with your plot from Part 4a.c. What was the average nitrate concentration during all days that chlorophyll-a was greater than 4 mg/m^3? Calculate and print this in 1-3 lines of code. Round your answer to one decimal place, and include units. You may want to check that this matches your plot from Part 4a.
d. Calculate the correlation coefficient (r) between the chlorophyll-a and nitrate time series. For this, use
pd.corr()
. Calculate and print this in 1-2 lines of code, and round your answer to three decimal places.
# Write 1-2 lines of code for Part 5a:
# Write 1-2 lines of code for Part 5b:
# Write 1-3 lines of code for Part 5c:
# Write 1-2 lines of code for Part 5d:
Image: Red Square at University of Washington (credit: Edward Aites, YouTube).
Numerical models are used to simulate the Earth's atmosphere and ocean, and predict future weather and ocean conditions. The same types of models are used to re-analyze past conditions with higher accuracy using observations from satellites, weather stations, and ocean platforms like profiling floats. The model output from these "reanalyses" offer a trustworthy, global view of past states of the atmosphere, land, and ocean.
In the Assignment #3 Google Drive folder, we have provided a netCDF file (era5_puget_sound_weather.nc) containing 3-dimensional model output (2-D space + time) from the ECMWF ERA5 global atmospheric reanalysis from 2018 to 2020 for the Puget Sound region around Seattle. Each grid cell is about 30 km x 30 km. Included are a few relevant weather variables.
!pip install netcdf4
) once to install the netCDF4 library. Then import NumPy, Pandas, xarray, Matplotlib, and datetime. Then use xr.open_dataset()
to load the CSV data file from Google Drive. Save it as a variable called weather_data
.# Run this line of code once for this notebook, then delete or comment it out:
# !pip install netcdf4
# Write your code for Part 1 here:
weather_data
using xarray's interactive interface.# Write your code for Part 2 here:
a. What variables are provided in this data set? Write both their abbreviations (variable names) and long names.
b. Taking into account only the number of variables and the dimensions of the data, how many total data points does this netCDF file contain? You may use a calculator or Python code to get this answer.
c. What time interval (spacing) are the data provided at? Use only the display interface to answer this question.
# Write your print statements for Part 3 here:
# Write your print statement for Part 4 here:
.sel()
indexing with the 'nearest'
option to select all the data inside weather_data
that are nearest to Red Square. Save the resulting xarray Dataset into a new variable called uw_weather_data
. Check that uw_weather_data
now has a single dimension: time.# Write your code for Part 5 here:
# Write your code for Part 6 here:
.sel()
indexing with a slice()
object to answer this question. Express both answers in °C, rounded to one decimal place. (If you're used to °F, you may want to convert to °F to check that your answers make sense.)# Write your code for Part 7 here:
weather_data
Dataset, calculate the average snowfall rate within the Puget Sound region. In other words, calculate an average over latitude and longitude, but keep the time dimension. Save the resulting Dataset as a new variable.# Write your code for Part 8 here:
weather_data
— do not simply copy and paste them into a string. Feel free to get creative with your line color using the options here: https://matplotlib.org/3.1.0/gallery/color/named_colors.html.# Write your code for Part 9 here: