Use the Measured Parameter Data Access Parquet format to visualize STOQS data
Executing this Notebook requires a personal STOQS server - these instructions are for a Docker installation. This Notebook builds on issues raised in https://github.com/stoqs/stoqs/issues/227.
Install and start the software as
detailed in the README. (Note that on MacOS you will need to modify settings in your docker-compose.yml
and .env
files — look for comments referencing 'HOST_UID'.)
Then, from your $STOQS_HOME/docker
directory start the Jupyter Notebook server - you can query from the remote database or from a copy that you've made to your local system:
Start the Jupyter Notebook server pointing to MBARI's master STOQS database server. (Note: firewall rules limit unprivileged access to such resources):
docker-compose exec \
-e DATABASE_URL=postgis://everyone:guest@kraken.shore.mbari.org:5432/stoqs \
stoqs stoqs/manage.py shell_plus --notebook
Restore a database of your choice from https://stoqs.shore.mbari.org/, for example below is how to make a local copy of the stoqs_september2013
database from MBARI's server onto your local database and then start the Jupyter Notebook server using the default DATABASE_URL (which should be your local system) also make sure that your Docker Desktop has at least 16 GB of RAM allocated to it:
cd $STOQS_HOME/docker
docker-compose exec stoqs createdb -U postgres stoqs_september2013
curl -k https://stoqs.shore.mbari.org/media/pg_dumps/stoqs_september2013.pg_dump | \
docker exec -i stoqs pg_restore -Fc -U postgres -d stoqs_september2013
docker-compose exec stoqs stoqs/manage.py shell_plus --notebook
Following execution of the stoqs/manage.py shell_plus --notebook
command a message is displayed giving a URL for you to use in a browser on your host, e.g.:
http://127.0.0.1:8888/?token=<a_token_generated_upon_server_start>
In the browser window opened to this URL navigate to this file (visualize_parquet.ipynb
) and open it. You will then be able to execute the cells and modify the code to suit your needs.
The information in the output cells result from execution on a 2019 MacBook Pro with a 2.4 GHz 8-Core Intel Core i9 processor, 32 GB 2667 MHz DDR4 RAM, running Docker Desktop 3.1.0 with 16 GB with 4 CPUs and 16 GB allocated.
In a browser navigate to https://stoqs.mbari.org/stoqs_september2013 and make selections as shown in this screen grab:
We will attempt to recreate this image from Issue 227:
but this time using Datashader which can handle a lot more data.
import time
t_start = time.time()
# Issuing a STOQS api request from inside the stoqs container - where this
# notebook is running - is not really possible. For testing with a
# host='localhost' url you need to make that request from your system and
# then copy the .parquet file to this directory: stoqs/contrib/notebooks.
# We have make two downloads as lrauvs and dorado have different Parameter names.
# It's theoretically possible to download all Parameter names from all three
# platforms, but that exceeds the container's RAM in my 16 GB Docker machine.
# It's more efficient to download just what we need.
##host = 'localhost'
host = 'stoqs.shore.mbari.org'
url_dorado = (f'https://{host}/stoqs_september2013/api/measuredparameter.parquet?'
'parameter__name=bbp420¶meter__name=fl700_uncorr&'
'parameter__name=salinity¶meter__name=temperature&'
'measurement__instantpoint__activity__platform__name=dorado&'
'collect=name')
##print(url_dorado) # Uncoment for 'localhost' download from system browser
url_lrauvs = (f'https://{host}/stoqs_september2013/api/measuredparameter.parquet?'
'parameter__name=bb470¶meter__name=chlorophyll&'
'parameter__name=salinity¶meter__name=temperature&'
'measurement__instantpoint__activity__platform__name=daphne&'
'measurement__instantpoint__activity__platform__name=tethys&'
'collect=name')
##print(url_lrauv) # Uncoment for 'localhost' download from system browser
!time wget --no-check-certificate -O stoqs_september2013_dorado.parquet "{url_dorado}"
!time wget --no-check-certificate -O stoqs_september2013_lrauvs.parquet "{url_lrauvs}"
--2021-03-17 17:52:07-- https://stoqs.shore.mbari.org/stoqs_september2013/api/measuredparameter.parquet?parameter__name=bbp420¶meter__name=fl700_uncorr¶meter__name=salinity¶meter__name=temperature&measurement__instantpoint__activity__platform__name=dorado&collect=name Resolving stoqs.shore.mbari.org (stoqs.shore.mbari.org)... 134.89.12.71 Connecting to stoqs.shore.mbari.org (stoqs.shore.mbari.org)|134.89.12.71|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 8334441 (7.9M) [application/octet-stream] Saving to: ‘stoqs_september2013_dorado.parquet’ stoqs_september2013 100%[===================>] 7.95M 4.80MB/s in 1.7s 2021-03-17 17:52:16 (4.80 MB/s) - ‘stoqs_september2013_dorado.parquet’ saved [8334441/8334441] 0.01user 0.11system 0:08.53elapsed 1%CPU (0avgtext+0avgdata 6860maxresident)k 0inputs+0outputs (0major+364minor)pagefaults 0swaps --2021-03-17 17:52:16-- https://stoqs.shore.mbari.org/stoqs_september2013/api/measuredparameter.parquet?parameter__name=bb470¶meter__name=chlorophyll¶meter__name=salinity¶meter__name=temperature&measurement__instantpoint__activity__platform__name=daphne&measurement__instantpoint__activity__platform__name=tethys&collect=name Resolving stoqs.shore.mbari.org (stoqs.shore.mbari.org)... 134.89.12.71 Connecting to stoqs.shore.mbari.org (stoqs.shore.mbari.org)|134.89.12.71|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 60694551 (58M) [application/octet-stream] Saving to: ‘stoqs_september2013_lrauvs.parquet’ stoqs_september2013 100%[===================>] 57.88M 4.42MB/s in 14s 2021-03-17 17:53:46 (4.18 MB/s) - ‘stoqs_september2013_lrauvs.parquet’ saved [60694551/60694551] 0.14user 0.76system 1:29.96elapsed 1%CPU (0avgtext+0avgdata 6948maxresident)k 0inputs+0outputs (0major+363minor)pagefaults 0swaps
import pandas as pd
%time df_dorado = pd.read_parquet('stoqs_september2013_dorado.parquet')
print(f"dorado data: {df_dorado.shape}")
##print(df_dorado.head(2))
%time df_lrauvs = pd.read_parquet('stoqs_september2013_lrauvs.parquet')
print(f"lrauv data: {df_lrauvs.shape}")
##print(df_lrauvs.head(2))
# Combine into single DataFrame for more generalized follow-on processing
df = df_dorado.append(df_lrauvs)
CPU times: user 152 ms, sys: 32 ms, total: 184 ms Wall time: 226 ms dorado data: (120505, 4) CPU times: user 1.7 s, sys: 204 ms, total: 1.9 s Wall time: 2.05 s lrauv data: (1531962, 4)
# Commit with do_plots = False, change to True for plots, but don't check it in that way
do_plots = False
plots = None
if do_plots:
import colorcet
import holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension("bokeh")
pts_dorado = hv.Points(df, kdims=['bbp420', 'fl700_uncorr'])
pts_daphne = hv.Points(df, kdims=['bb470', 'chlorophyll'])
pts_tethys = hv.Points(df, kdims=['bb470', 'chlorophyll'])
plots = ( datashade(pts_dorado, cmap=colorcet.fire).opts(title='dorado')
+ datashade(pts_daphne, cmap=colorcet.fire).opts(title='daphne')
+ datashade(pts_tethys, cmap=colorcet.fire).opts(title='tethys') )
plots
print(f"Time to execute this notebook: {(time.time() - t_start):.1f} seconds")
Time to execute this notebook: 103.8 seconds