Extract LOPC data from a Dorado mission, explore it, and add total LOPC count Parameters to your STOQS database.
Executing this Notebook requires a personal STOQS database. Follow the steps to build your own development system — this will take a few hours and depends on a good connection to the Internet. Once your server is up log into it (after a cd ~/Vagrants/stoqsvm
) and activate your virtual environment with the usual commands:
vagrant ssh -- -X
cd /vagrant/dev/stoqsgit
source venv-stoqs/bin/activate
Then load the stoqs_simz_aug2013
database with the commands:
cd stoqs
ln -s mbari_campaigns.py campaigns.py
export DATABASE_URL=postgis://stoqsadm:CHANGEME@127.0.0.1:5432/stoqs
loaders/load.py --db stoqs_simz_aug2013
loaders/load.py --db stoqs_simz_aug2013 --updateprovenance
Loading this database takes an hour or so. Once it's finished you can interact with the data quite efficiently, as this Notebook demonstrates. Launch Jupyter Notebook with:
cd contrib/notebooks
../../manage.py shell_plus --notebook
navigate to this file and open it. You will then be able to execute the cells and experiment with different settings and code.
We will look in detail at the LOPC data that is rendered in the left-hand panels of this quick look plot from Survey Dorado389_2013_225_01_225_01:
Set db
and survey
variables and construct a Django query set template for getting LOPC data (where the dataarray
field is not null) from the STOQS database. We using the database diagram for help in navigating the relationships and constructing the list of values we want to retrieve.
db = 'stoqs_simz_aug2013'
survey = 'Dorado389_2013_225_01_225_01'
lopc = MeasuredParameter.objects.filter(dataarray__isnull=False).values(
'measurement__instantpoint__timevalue', 'measurement__depth',
'measurement__geom', 'parameter__domain', 'dataarray')
Let's look at the LOPC's Single Element Plankton count data which is loaded into STOQS as the sepCountList Parameter. We modify the the lopc
query set template with our database name, survey name (which in STOQS is part of the Activity name), and Parameter name constraints and feed the records into a Pandas DataFrame object named sep
:
import pandas as pd
sep = pd.DataFrame.from_records(lopc.using(db).filter(
measurement__instantpoint__activity__name__contains=survey,
parameter__name='sepCountList'))
Let's see how many records we got and look at the first 2:
print(len(sep))
sep[:2]
676
dataarray | measurement__depth | measurement__geom | measurement__instantpoint__timevalue | parameter__domain | |
---|---|---|---|---|---|
0 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2585.0, 1261.0,... | 0.042208 | [-121.88311757276813, 36.90482998210013] | 2013-08-13 22:08:45 | [108.0, 123.0, 138.0, 153.0, 168.0, 183.0, 198... |
1 | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2539.0, 1112.0,... | 0.571604 | [-121.88314317796421, 36.90485415120313] | 2013-08-13 22:08:38 | [108.0, 123.0, 138.0, 153.0, 168.0, 183.0, 198... |
We can now see what we're dealing with. At each point in space and time as Dorado collected data we have an array of SEP counts (dataarray column) in the size classes identified by the array in the parameter__domain column. Let's plot some of the data.
%matplotlib inline
import pylab
pylab.rcParams['figure.figsize'] = (12.0, 4.0)
def label_plot(yaxis_name):
pylab.title('LOPC data from ' + survey)
pylab.ylabel(yaxis_name)
pylab.xlabel('Size class (microns)')
for i,s in sep.iterrows():
pylab.plot(s['parameter__domain'], s['dataarray'])
label_plot(Parameter.objects.using(db).get(name='sepCountList').long_name)
It looks like all the counts are for plankton less than about 1mm in size, let's zoom in on the small stuff:
label_plot(Parameter.objects.using(db).get(name='sepCountList').long_name)
for i,s in sep.iterrows():
pylab.plot(s['parameter__domain'][5:30], s['dataarray'][5:30])
This looks moderately interesting. Let's do the same thing with the Multiple Element Plankton count data.
label_plot(Parameter.objects.using(db).get(name='mepCountList').long_name)
mep = pd.DataFrame.from_records(lopc.using(db).filter(
measurement__instantpoint__activity__name__contains=survey,
parameter__name='mepCountList'))
for i,m in mep.iterrows():
pylab.plot(m['parameter__domain'], m['dataarray'])
The MEP data have significant counts of plankton up to 5 mm in size. Let's zoom in:
label_plot(Parameter.objects.using(db).get(name='mepCountList').long_name)
for i,m in mep.iterrows():
pylab.plot(m['parameter__domain'][:70], m['dataarray'][:70])
Now let's summarize these data and put the summary back into STOQS as scalar Parameters so that we can compare with other data using the STOQS User Interface.
# Create new Parameters in the database
p,created = Parameter.objects.using(db).get_or_create(
name='lopc_total_count', units='count',
long_name='Total of SEP and MEP Counts')
logp,created = Parameter.objects.using(db).get_or_create(
name='log_lopc_total_count', units='log_count',
long_name='Log of Total of SEP and MEP Counts')
# Assign new parameters to the ParameterGroup 'Measured in situ'
for px in [p, logp]:
ParameterGroupParameter.objects.using(db).get_or_create(
parametergroup=ParameterGroup.objects.using(db).get(
name='Measured in situ'), parameter=px)
# Construct 3 query sets
mps = MeasuredParameter.objects.using(db).filter(dataarray__isnull=False,
measurement__instantpoint__activity__name__contains=survey).order_by(
'measurement__instantpoint__timevalue')
seps = mps.filter(parameter__name='sepCountList').values_list('dataarray', flat=True)
meps = mps.filter(parameter__name='mepCountList').values_list('dataarray', flat=True)
# Create new summarized MeasuredParameters
import math
for mp,sep,mep in zip(mps.filter(parameter__name='sepCountList'), seps, meps):
MeasuredParameter.objects.using(db).get_or_create(parameter=p,
measurement=mp.measurement, datavalue = sum(sep) + sum(mep))
MeasuredParameter.objects.using(db).get_or_create(parameter=logp,
measurement=mp.measurement, datavalue = math.log(sum(sep) + sum(mep), 10))
# Use core STOQS loader software to update the database with descriptive statistics
from loaders import STOQS_Loader
STOQS_Loader.update_ap_stats(db, activity=Activity.objects.using(db
).get(name__contains=survey), parameters=[p, logp])
# Report count of records loaded
print("Number of records loaded: {:d}".format(
MeasuredParameter.objects.using(db).filter(parameter__name='lopc_total_count'
).count() +
MeasuredParameter.objects.using(db).filter(parameter__name='log_lopc_total_count'
).count()))
Number of records loaded: 1352
Now open a browser pointing to the development server running on your system, e.g.: http://localhost:8000/. Let's compare what we loaded into the database with the data displayed in the quick look plot above. Select the log_lopc_total_count Parameter in the Measured Parameter section to restrict the display to just activities having that Parameter, then click the radio buttton in the Plot Data column and the 'contour' radio button. You should see something like this:
from IPython.display import Image
Image('../../../doc/Screenshots/Screen_Shot_2015-10-03_at_3.38.57_PM.png')
That looks pretty close! Now that the summarize and original data are in our database we can interact with it. For example let's examine the particle size distributions for some of the Gulper water samples. Let's first make a dictionary of the times of the Gulper samples:
samples = {}
for s in Sample.objects.using(db).select_related('instantpoint').filter(
instantpoint__activity__name__contains=survey, sampletype__name='Gulper'):
samples[int(s.name)] = s.instantpoint.timevalue
All LOPC data are processed with a binning over time (hard-coded to 20 seconds in the lopcToNetCDF.py script) in order to collect MEP data. Let's query the database for SEP data within 20 seconds of the sample time and plot the size class distribution for each Gulper sample in one plot:
from datetime import timedelta
binning_secs = 20
label_plot(Parameter.objects.using(db).get(name='sepCountList').long_name)
for sa,tv in samples.items():
trange = [tv - timedelta(seconds=binning_secs), tv + timedelta(seconds=binning_secs)]
sep = pd.DataFrame.from_records(lopc.using(db).filter(
measurement__instantpoint__timevalue__range = trange,
parameter__name='sepCountList'))
for i,se in sep.iterrows():
pylab.plot(se['parameter__domain'][:70], se['dataarray'][:70],
label='Gulper {:d}'.format(sa))
pylab.legend()
<matplotlib.legend.Legend at 0x7fa1d754d2e8>
Now, let's do the same thing for MEP data:
label_plot(Parameter.objects.using(db).get(name='mepCountList').long_name)
for sa,tv in samples.items():
trange = [tv - timedelta(seconds=binning_secs), tv + timedelta(seconds=binning_secs)]
mep = pd.DataFrame.from_records(lopc.using(db).filter(
measurement__instantpoint__timevalue__range = trange,
parameter__name='mepCountList'))
for i,me in mep.iterrows():
pylab.plot(me['parameter__domain'][:70], me['dataarray'][:70],
label='Gulper {:d}'.format(sa))
pylab.legend()
<matplotlib.legend.Legend at 0x7fa1d40809b0>
At this point it would be interesting to compare the Gulper water sample analyses with these measurements. Also, this notebook may be re-executed for different databases and surveys. It also serves as a test bed for developing classification algortihms that would produce more interesting Parameters than simple the lopc_total_count Parameter that we created here.