Notebook

Assignment #3¶

Due: Saturday, November 14th at 11:59 pm PT

Objective: This assignment will give you experience loading CSV and netCDF data files using NumPy, Pandas, and xarray, and plotting data using Matplotlib.

Instructions:

This version of the assignment cannot be edited. To save an editable version, copy this Colab file to your individual class Google Drive folder ("OCEAN 215 - Autumn '20 - {your name}") by right clicking on this file and selecting "Move to".
Open the version you copied.
Complete the assignment by writing and executing text and code cells as specified. To complete this assignment, you do not need any material beyond Lesson #11. However, you may use material from beyond Lesson #11 if you wish as long as it has been discussed in a lesson or class, and is not prohibited in the question.
When you're finished and are ready to submit the assignment, simply save the Colab file ("File" menu –> "Save") before the deadline, close the file, and keep it in your individual class Google Drive folder.
If you need more time, please see the section "Late work policy" in the syllabus for details.

Honor code: In the space below, you can acknowledge and describe any assistance you've received on this assignment, whether that was from an instructor, classmate (either directly or on Piazza), and/or online resources other than official Python documentation websites like docs.python.org or numpy.org. Alternatively, if you prefer, you may acknowledge assistance at the relevant point(s) in your code using a Python comment (#). You do not have to acknowledge OCEAN 215 class or lesson resources.

Acknowledge assistance here:

Question 1 (6 points)¶

APIs for loading files using NumPy, Pandas, and xarray¶

Useful resources: Class #7 discussion on APIs, Lesson #7 on loading data using NumPy, Lesson #9 on Pandas and xarray

Data is imperfect, and reading it in Python can require a lot of configuration to make sure that your data is being treated properly. Fortunately, the functions we use to read data all have optional arguments that can help us get exactly what we want from our files. These arguments are well-documented and available at each package's API website. Use the linked APIs for each of the three file-reading functions we have learned about in class to understand how to use certain arguments.

Answer the following questions about arguments you can use in np.genfromtxt (API linked here).

a. What argument name allows you to skip lines at the beginning of the file that are not the data you want to read?

b. What argument name allows you to select only certain columns to load data from?

Answer the following questions about arguments you can use in pd.read_csv() (API linked here).

a. What argument name allows you to handle an unusual string used as a placeholder for missing data?

b. What argument name tells Pandas to ignore lines or portions of lines beginning with a certain character?

Answer the following questions about arguments you can use in xr.open_dataset() (API linked here).

a. What argument name can be used to exclude a variable from being loaded?

b. When using the decode_times argument, what object type is required as the input? What is the default value of decode_times?

Write a print statement with the answer to each part.

Example: What is the argument that specifies the separation character used in a file?

In [ ]:

# Example:
print('Part 0a: delimiter')

Part 0a: delimiter

In [ ]:

# Your answers below:

Question 2 (13 points)¶

Hydrographic profiles¶

Useful resources: Lesson #7 on loading data and plotting; Lesson #11 on advanced plotting

Research cruises are invaluable data sources for oceanographers. During a cruise, measurements of the water column known as CTD casts are conducted. In each cast, data are collected about the seawater salinity (from conductivity), temperature, and pressure (or depth) (hence "CTD") as well as chemical properties such as dissolved oxygen concentrations and chlorophyll fluorescence.

Historically, hydrographic cruise programs such as CLIVAR, WOCE, and GO-SHIP have repeated measurements at approximately the same locations at intervals of a few years or decades. Repeating these cruises and casts enables us to see how the ocean changes over time!

In the Assignment #3 Google Drive folder, we have provided data files from two CTD casts taken during two different cruises along the same ship transect in the Southern Ocean (see figure). One cast is from 2008 and the other cast is from 2019.

Files:

WOCE_I06S_CTD_20190416.csv (from 2019)
WOCE_I06S_CTD_20080305.csv (from 2008)

Using these files, complete the following 4 tasks:

Using readline() and for loop(s), print the first 15 lines of each of the provided files.
Answer the following questions for both files using print statements (in this case, hard-coding numbers into your strings is acceptable):

a. How many header lines do these data files have, including the column labels and column units? (Note that column information is considered part of the header for np.genfromtxt() but not for pd.read_csv().)

b. What columns are the pressure, temperature, and salinity measurements in?

c. What are the latitude and longitude coordinates for these casts?

d. What is the primary delimiter in these data files?

In [ ]:

# Write your code below:

Using np.genfromtxt() and the information you found for Part 2, load the pressure, temperature, and salinity data from these CTD casts.
Plot the data following these parameters:

a. Set up a figure with two subplots: a left and a right subplot.

b. On the left subplot, plot the temperature (x-axis) versus pressure (y-axis) for each of the casts. Use the same marker for both lines, but different colors for the lines. Add a legend on your plot denoting the different cast years.

c. On the right subplot, plot the salinity (x-axis) versus pressure (y-axis) for each of the casts. Use the same marker on both lines, but a different marker than the temperature markers. Have the line colors correspond to the colors used in the temperature plot. Add a legend on your plot denoting the different cast years.

d. Put grids on both plots and reverse the y-axis directions so that pressure is increasing downwards. Don't forget to properly label your plots!

In [ ]:

# Write your code below:

Question 3 (17 points)¶

Biogeochemical profiling float time series¶

Useful resources: Lessons #7-#8 and Class #8 notebook on loading data and making line/scatter plots; Lesson #9 on Pandas

Image: Research scientist Rick Rupan at University of Washington's Argo/SOCCOM float laboratory (credit: Earle Wilson).

Biogeochemical profiling floats, like those built at UW for the SOCCOM project, are deployed from a ship, then they drift with the currents for about 5 years while collecting measurements.

Every 7-10 days, each float sinks to a depth of 2000 m depth, then it measures physical parameters (like temperature and salinity) and biogeochemical parameters (like pH, oxygen, nitrate, and chlorophyll concentrations) as it ascends back to the surface to transmit its measurements by satellite. If a float is unable to transmit its data, its position must be estimated based on its last known position and the position at which it begins transmitting again.

In the Assignment #3 Google Drive folder, we have provided a CSV file (Southern_Ocean_float_9094_time_series.csv) containing near-surface measurements from SOCCOM float #9094, most from the upper 20 m of the ocean. The float drifted around the Weddell Sea, a region of the Southern Ocean offshore of Antarctica, from 2014 to 2019.

Import NumPy, Pandas, Matplotlib, and datetime. Then use pd.read_csv() to load the CSV data file from Google Drive. Use the parse_dates and index_col arguments to read values in the "Datetimes" column as datetime objects and set that column as the DataFrame index.

In [ ]:

# Write your code for Part 1 here:

Write two lines of code for the following:

a. Display the Pandas DataFrame containing the data you just loaded.

b. Display the DataFrame's summary statistics using .describe().

In [ ]:

# Write your code for Part 2 here:

Using only what you just displayed, answer the following questions (no additional code necessary). Provide your answers in print statements. For this part, it is okay to copy and paste numbers from the Pandas table into your print statements:

a. How many parameters are provided in this data set, not including datetimes, latitude, and longitude?

b. The data counts show that three columns are missing data. What are those three columns?

c. What was the coldest temperature measured by the float? Round your answer to two decimal places and include units.

d. What is the standard deviation of dissolved oxygen in the data? Round your answer to one decimal place and include units.

In [ ]:

# Write your print statements for Part 3 here:

Use Matplotlib to create a figure of size 16 x 5 with two subplots (one row and two columns). After setting up the subplot canvas, include the following line of code: plt.tight_layout(). This adjusts the spacing between subplots to make them look nicer.

a. On the left subplot, make a black line plot of time vs. chlorophyll-a concentration. On top of the line, add a scatter plot of time vs. chlorophyll-a, with the markers colored according to nitrate concentration. Add a colorbar and label it appropriately, including units. Add axis labels, a title, and grid lines.

b. On the right subplot, make a black line plot of the float's geographic track (i.e. longitude vs. latitude). On top of the line, add a scatter plot of the location points, with the markers colored according to temperature. Set the colormap to one of these options from Matplotlib, choosing a colormap that transitions from dark to light colors. Add a colorbar and label it appropriately, including units. Add axis labels, a title, and grid lines.

c. What does the left subplot reveal about the relationship between chlorophyll (a pigment found in phytoplankton) and nitrate concentration (a dissolved nutrient)? Provide your answer in a print statement.

d. The freezing point of seawater is about –2°C. Sea ice forms when the surface ocean is close to freezing. With this in mind, what does the right subplot reveal about the float's ability to transmit accurate lat/lon positions year-round? Provide your answer in a print statement.

In [ ]:

# Write your code for Part 4 here:

# Set up the subplots canvas:

# Keep (and uncomment) this line of code:
# plt.tight_layout()

# Draw the two subplots:

# Write your print statements for Parts 4c and 4d here:

Write code to answer the following questions, focusing on Pandas functions and indexing, then print your answers.

a. What was the salinity measurement on July 10, 2016? Use .loc[] selection to identify and print this in 1-2 lines of code. Round your answer to one decimal place, and include units.

b. What was the highest chlorophyll-a concentration in 2018? Use .loc[] and slicing to calculate and print this in 1-2 lines of code. Round your answer to one decimal place, and include units. You may want to check that this makes sense with your plot from Part 4a.

c. What was the average nitrate concentration during all days that chlorophyll-a was greater than 4 mg/m^3? Calculate and print this in 1-3 lines of code. Round your answer to one decimal place, and include units. You may want to check that this matches your plot from Part 4a.

d. Calculate the correlation coefficient (r) between the chlorophyll-a and nitrate time series. For this, use pd.corr(). Calculate and print this in 1-2 lines of code, and round your answer to three decimal places.

In [ ]:

# Write 1-2 lines of code for Part 5a:

# Write 1-2 lines of code for Part 5b:

# Write 1-3 lines of code for Part 5c:

# Write 1-2 lines of code for Part 5d:

Question 4 (14 points)¶

Puget Sound weather¶

Useful resources: Lessons #7-#8 and Class #8 notebook on loading data and making line plots; Lesson #9 on xarray; Lesson #11 on advanced plotting

Image: Red Square at University of Washington (credit: Edward Aites, YouTube).

Numerical models are used to simulate the Earth's atmosphere and ocean, and predict future weather and ocean conditions. The same types of models are used to re-analyze past conditions with higher accuracy using observations from satellites, weather stations, and ocean platforms like profiling floats. The model output from these "reanalyses" offer a trustworthy, global view of past states of the atmosphere, land, and ocean.

In the Assignment #3 Google Drive folder, we have provided a netCDF file (era5_puget_sound_weather.nc) containing 3-dimensional model output (2-D space + time) from the ECMWF ERA5 global atmospheric reanalysis from 2018 to 2020 for the Puget Sound region around Seattle. Each grid cell is about 30 km x 30 km. Included are a few relevant weather variables.

Run the given line of code (!pip install netcdf4) once to install the netCDF4 library. Then import NumPy, Pandas, xarray, Matplotlib, and datetime. Then use xr.open_dataset() to load the CSV data file from Google Drive. Save it as a variable called weather_data.

In [ ]:

# Run this line of code once for this notebook, then delete or comment it out:
# !pip install netcdf4

In [ ]:

# Write your code for Part 1 here:

Display the xarray Dataset weather_data using xarray's interactive interface.

In [ ]:

# Write your code for Part 2 here:

Using the interface that you just displayed, answer the following questions. Provide your answers in print statements. It is okay to copy and paste from the interface.

a. What variables are provided in this data set? Write both their abbreviations (variable names) and long names.

b. Taking into account only the number of variables and the dimensions of the data, how many total data points does this netCDF file contain? You may use a calculator or Python code to get this answer.

c. What time interval (spacing) are the data provided at? Use only the display interface to answer this question.

In [ ]:

# Write your print statements for Part 3 here:

Use Google Maps to find the latitude and longitude of Red Square on the University of Washington's Seattle campus. You can find this by right-clicking on a location and selecting "What's here?". The latitude is given in °N and the longitude in °E. Print these coordinates rounded to two decimal places. No need to write code for this.

In [ ]:

# Write your print statement for Part 4 here:

Use .sel() indexing with the 'nearest' option to select all the data inside weather_data that are nearest to Red Square. Save the resulting xarray Dataset into a new variable called uw_weather_data. Check that uw_weather_data now has a single dimension: time.

In [ ]:

# Write your code for Part 5 here:

Use Matplotlib to plot a histogram of cloud cover fraction at Red Square (note: 0 is 0%, or clear skies, and 1 is 100%, or fully cloudy skies). Label your axes and add a title.

In [ ]:

# Write your code for Part 6 here:

What have the warmest and coldest temperatures been at Red Square so far in 2020? Use .sel() indexing with a slice() object to answer this question. Express both answers in °C, rounded to one decimal place. (If you're used to °F, you may want to convert to °F to check that your answers make sense.)

In [ ]:

# Write your code for Part 7 here:

Using the original weather_data Dataset, calculate the average snowfall rate within the Puget Sound region. In other words, calculate an average over latitude and longitude, but keep the time dimension. Save the resulting Dataset as a new variable.

In [ ]:

# Write your code for Part 8 here:

Using the Dataset you just created, make a line plot of time vs. average snowfall rate from 2018 to 2020. Label your axes (including units for snowfall rate) and add a title and grid lines. Note: retrieve the units for snowfall rate from the attributes of weather_data — do not simply copy and paste them into a string. Feel free to get creative with your line color using the options here: https://matplotlib.org/3.1.0/gallery/color/named_colors.html.

In [ ]:

# Write your code for Part 9 here: