In [1]:

from vortexasdk import Products, CargoTimeSeries, Geographies
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt

Products tutorial¶

First I’m going to show you how to get the Vortexa ID for a product you are interested in studying. There are many ways of doing this, and today I’m going to show you one option I have used.

From the examples in the docs found here: https://vortechsa.github.io/python-sdk/endpoints/products/. We can see an example line of code which shows us how to look in for different products in a list

In [2]:

df = Products().search(term=['diesel', 'fuel oil', 'grane']).to_df()

2020-07-15 15:25:42,239 vortexasdk.operations — INFO — Searching Products
2020-07-15 15:25:42,240 vortexasdk.client — INFO — Creating new VortexaClient
2020-07-15 15:25:42,795 vortexasdk.client — WARNING — You are using vortexasdk version 0.21.1, however version 0.23.0 is available.
You should consider upgrading via the 'pip install vortexasdk --upgrade' command.
2020-07-15 15:25:42,796 vortexasdk.client — INFO — Payload: {'term': ['diesel', 'fuel oil', 'grane'], 'allowTopLevelProducts': True}
2020-07-15 15:25:43,452 vortexasdk.client — INFO — 71 Results to retrieve. Sending 1 post requests in parallel using 6 threads.

In [3]:

df.head()

Out[3]:

	id	name	layer.0	parent.0.name
0	1c107b4317bc2c85fb6c13cd7b28e8e0a02ec7fecc68af...	Fuel Oil	group_product	Dirty Petroleum Products
1	da9e67b1dd201a1ad3ae251e6dce8fad404ce50d53adba...	RMD Fuel Oil	grade	High Sulphur Fuel Oil
2	e6e879021c45470d84ee2f182a42a13537b489ae730640...	Other Fuel Oils	grade	High Sulphur Fuel Oil
3	d31139c20953fae7510f0982c922736391b752dced644c...	Cracked Fuel Oil	grade	High Sulphur Fuel Oil
4	0dd7857f328177189e20a54af1dc599c8b7e04e79e61cd...	Fuel Oil Cut	grade	High Sulphur Fuel Oil

For my study I want to focus on crude/condensates so I am going to modify the list which contains oil product names like this:

In [4]:

crude_search_df = Products().search(term=['crude']).to_df()

2020-07-15 15:25:43,920 vortexasdk.operations — INFO — Searching Products
2020-07-15 15:25:43,922 vortexasdk.client — INFO — Payload: {'term': ['crude'], 'allowTopLevelProducts': True}
2020-07-15 15:25:44,351 vortexasdk.client — INFO — 18 Results to retrieve. Sending 1 post requests in parallel using 6 threads.

In [5]:

crude_search_df

Out[5]:

	id	name	layer.0	parent.0.name
0	6f11b0724c9a4e85ffa7f1445bc768f054af755a090118...	Crude	group_product	Crude/Condensates
1	e1b3d075a9340969322024a35b8c720e5065d8c95a91b6...	Crude Vegoil	grade	Biodiesel Feedstock
2	4fe046d7478c4072b9ecbaa2c03d3b977bbfec5077ee44...	Crude Butadiene	grade	Olefins/Other Chemicals
3	54af755a090118dcf9b0724c9a4e9f14745c26165385ff...	Crude/Condensates	group	NaN
4	9fda040ee8844e47b5239051e322d06dd9d2b96f0c3249...	TPAO crude	grade	Medium-Sour
5	2b76490350005604507ca64567101df7d3db80a973f462...	Crude Vegetable Oil	grade	Biodiesel Feedstock
6	f26cd12252b0bb23b4ab4ab590eaee11037d78a7ac5cc6...	Crude Benzene	grade	Chemicals
7	e9d1031a9167fff80ef089f2d5591deb1833ee34f1d028...	Crude Palm Oil	grade	Biodiesel Feedstock
8	c2aa8895d3d176868cbb0519f2bb5318a74a2a66ac9034...	Eagle Ford crude	grade	Light-Sweet
9	164a4a510343458bbf949a79964ce3bb8efa8d14276c48...	US Shale Crude	grade	Light-Sweet
10	4ff810b966104a658d22e7155596c11bedcd1e3e81594e...	Yuri Korchagin Crude	grade	Light-Sweet
11	6fd8a34225de493ab9bff7f95880d96a3e34737db24849...	Domestic Sweet Crude (DSW)	grade	Light-Sour
12	c49a65c1651913a1cb3b185760ab952827b5b9ddc03e1e...	Crude Soybean Oil (CSBO)	grade	Biodiesel Feedstock
13	c7bee2499ebba9677e11930891fdd067c293b8536efe53...	Crude Sunflower Oil (CSFO)	grade	Biodiesel Feedstock
14	c4ad8221d48642b2adc7b363c68a3d729b3ef7f2d9aa42...	Diluted Crude Oil (DCO)	grade	Heavy-Sour
15	bc9deabd28d74b5985056195dcbcd33ccae7c616e2584a...	Reconstituted Crude Oil (Recon)	grade	Dirty Condensates
16	98fa8034b122632b13564878dd75d902faa735b822cac9...	Crude Degummed Soybean Oil	grade	Biodiesel Feedstock
17	ca15a14cae72539854f53413a0f8668bcca2ec90053722...	Crude Blendstock for Oxygenate Blending	grade	Blending Components

Here we can see that there are 18 rows and we only want the id where the name column is equal to Crude/Condensates. So we can query the DataFrame like this to get just row of the DataFrame we are interested in.

In [6]:

crude_search_df.query("name=='Crude/Condensates'")

Out[6]:

	id	name	layer.0	parent.0.name
3	54af755a090118dcf9b0724c9a4e9f14745c26165385ff...	Crude/Condensates	group	NaN

If you look at the end of the id you can see it finishes with ... this suggests that we cant see the full legnth of the column. If we increase the width options of the row like this pd.set_option('max_colwidth', 75) and run the same query of the dataframe we can see the full id. The rest of the notebook will have the column width settings we have defined here so we will not need to do it again.

In [7]:

pd.set_option('max_colwidth', 75)
crude_search_df.query("name=='Crude/Condensates'")

Out[7]:

	id	name	layer.0	parent.0.name
3	54af755a090118dcf9b0724c9a4e9f14745c26165385ffa7f1445bc768f06f11	Crude/Condensates	group	NaN

Geographies¶

Just like before we are going to use one of the examples from the documentation and slightly tweak it to what we need for our study. Docs found here: https://vortechsa.github.io/python-sdk/endpoints/geographies/.

In [8]:

df = Geographies().search(term=["Liverpool", "Southampton"]).to_df()

2020-07-15 15:25:46,414 vortexasdk.operations — INFO — Searching Geographies
2020-07-15 15:25:46,415 vortexasdk.client — INFO — Payload: {'term': ['Liverpool', 'Southampton']}
2020-07-15 15:25:46,565 vortexasdk.client — INFO — 4 Results to retrieve. Sending 1 post requests in parallel using 6 threads.

In [9]:

df

Out[9]:

	id	name	layer
0	b600e4f54cbdef578b2c6bdd7f508212ee4fb9552991d82e180a3d8f625669fd	Liverpool [GB]	[port]
1	0c69b0aaf2d110e102dd3b3f05d5540b8e5346d27f344a71983ab7d4566de0f2	Southampton [GB]	[port]
2	8baf08eff90bf9a1677ca0e55aae7e139bfaf155563b1ee1e0824273e3181f2d	Liverpool Docks	[terminal]
3	98db74d66fac18f2b0d8488f46af96fec3fc6edb7bc267cf75a1c50b0d2ee2b1	Liverpool Bulk Liquids	[terminal]

In [10]:

china_search_df = Geographies().search(term=["China"]).to_df()

2020-07-15 15:25:47,216 vortexasdk.operations — INFO — Searching Geographies
2020-07-15 15:25:47,217 vortexasdk.client — INFO — Payload: {'term': ['China']}
2020-07-15 15:25:47,363 vortexasdk.client — INFO — 13 Results to retrieve. Sending 1 post requests in parallel using 6 threads.

In [11]:

china_search_df.head(5)

Out[11]:

	id	name	layer
0	934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2	China	[country]
1	781cacc7033f877caa4b4106d096b74afe006a96391bf5a56a4f55b849359a42	South China	[shipping_region]
2	a63890260e29d859390fd1a23c690181afd4bd152943a04c00cd6a5ecf3f7d1e	North China	[shipping_region]
3	b5fafce6e20de2dc307fb7e0b89978ee91a49a7b6ec6f5461daf2633f3c56674	China (excl. HK & Macau)	[shipping_region]
4	9a021f43c397b175ddfff7a91d46ee6e6e16d37e9f9d52398ac6895656109d86	China Steel Chemical	[terminal]

In [12]:

china_search_df.query("name=='China'")

Out[12]:

	id	name	layer
0	934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2	China	[country]

Chinese floating storage study¶

For my study I want to look at Crude and condensates in currently in floating storage sitated in China and how this has changed over in 2020. So once again Im going to take the code which is provided in the documentation and change it to my specific needs.

Lets break down the query bellow line by line.

Together lets break this down line by line to understand whats going on.

The first line finds the ID for Rotterdam using the geographies endpoint and assigns it to a variable called rotterdam
Then the ID for crude using the products endpoint and assigns it to a variable called crude
Then it calls the CargoTimeSeries endpoint
The timeseries_unit arguement is set to bpd which means the unit is set to barrels
The timeseries_frequency arguement is set to month which means the time scale is set the months
The filter_origins arguement is set to rotterdam the variable defined in the 1st line
The filter_products arguement is set to crude which was defined in the 2nd line
The filter_activity argument is set to loading_state.
The filter_time_min, the start time for the query is set to the beginning of 2018
The filter_time_max, the end time for the query is set to the end of 2018
The search result is turned into a DataFrame

In [13]:

rotterdam = [g.id for g in Geographies().search("rotterdam").to_list() if "port" in g.layer]
crude = [p.id for p in Products().search("crude").to_list() if "Crude" == p.name]
search_result = CargoTimeSeries().search(
    timeseries_unit='bpd',
    timeseries_frequency='month',
    filter_origins=rotterdam,
    filter_products=crude,
    filter_activity='loading_state',
    filter_time_min=datetime(2018, 1, 1),
    filter_time_max=datetime(2018, 12, 31))
df = search_result.to_df()

2020-07-15 15:25:49,480 vortexasdk.operations — INFO — Searching Geographies
2020-07-15 15:25:49,481 vortexasdk.client — INFO — Payload: {'term': ['rotterdam']}
2020-07-15 15:25:49,649 vortexasdk.client — INFO — 11 Results to retrieve. Sending 1 post requests in parallel using 6 threads.
2020-07-15 15:25:51,124 vortexasdk.operations — INFO — Searching Products
2020-07-15 15:25:51,125 vortexasdk.client — INFO — Payload: {'term': ['crude'], 'allowTopLevelProducts': True}
2020-07-15 15:25:51,275 vortexasdk.client — INFO — 18 Results to retrieve. Sending 1 post requests in parallel using 6 threads.
2020-07-15 15:25:52,744 vortexasdk.operations — INFO — Searching CargoTimeSeries
2020-07-15 15:25:52,745 vortexasdk.client — INFO — Payload: {'filter_activity': 'loading_state', 'filter_time_min': '2018-01-01T00:00:00.000Z', 'filter_time_max': '2018-12-31T00:00:00.000Z', 'filter_products': ['6f11b0724c9a4e85ffa7f1445bc768f054af755a090118dcf99f14745c261653'], 'filter_origins': ['68faf65af1345067f11dc6723b8da32f00e304a6f33c000118fccd81947deb4e'], 'timeseries_frequency': 'month', 'timeseries_unit': 'bpd', 'timeseries_activity': 'loading_state', 'size': 500}

So how can we change that query to get Crude/condestates in floating storage sitatued in China? As we already have the IDs for our geography and product we dont need to call those endpoints in our first 2 lines.

We can assign the ID for China to a variable called china_id using the ID we found earlier in the notebook
Assign the ID for the Crude/Condensates to a variable called crude_condesates_id
We keep this the same as before as we are calling the same endpoint.
For our 4th line, I prefer to think of things in terms of tonnes so I’m going to change the timeseries_unit to be t.
For the 5th line, I’m going to change month to day as I'd like to see the change on a daily basis
Here I’m going to change this one slightly, as I’m not concerned where the crude/ condensates have come from I’m going to remove the filter_origins argument and replace it which filter_storage_locations, and set it to china_id which we have defined in the first line.
Set the filter_products argument to crude_condesates_id which we have defined in the 2nd line.
This time for the 8th line I’m going to set the filter_activity to 'storing_state'.
Here I have changed the date to be at the start of this year
Using datetime.today().date() we get today’s date
Finally, I’m going to keep the 11th line the same. As I would like the results to be a DataFrame just like in the first query

Lets see what happens

In [14]:

china_id = '934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2'
crude_condesates_id = '54af755a090118dcf9b0724c9a4e9f14745c26165385ffa7f1445bc768f06f11'
search_result = CargoTimeSeries().search(
    timeseries_unit='t',
    timeseries_frequency='day',
    filter_storage_locations=china_id,
    filter_products=crude_condesates_id,
    filter_activity='storing_state',
    filter_time_min=datetime(2020, 1, 1),
    filter_time_max=datetime.today().date())
df_fs = search_result.to_df()

2020-07-15 15:25:53,405 vortexasdk.operations — INFO — Searching CargoTimeSeries
2020-07-15 15:25:53,406 vortexasdk.client — INFO — Payload: {'filter_activity': 'storing_state', 'filter_time_min': '2020-01-01T00:00:00.000Z', 'filter_time_max': '2020-07-15T00:00:00.000Z', 'filter_products': ['54af755a090118dcf9b0724c9a4e9f14745c26165385ffa7f1445bc768f06f11'], 'filter_storage_locations': ['934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2'], 'timeseries_frequency': 'day', 'timeseries_unit': 't', 'timeseries_activity': 'storing_state', 'size': 500}

In [15]:

df_fs

Out[15]:

	key	value	count
0	2020-01-01T00:00:00.000Z	1696002	25
1	2020-01-02T00:00:00.000Z	1390004	20
2	2020-01-03T00:00:00.000Z	1206009	20
3	2020-01-04T00:00:00.000Z	1687160	22
4	2020-01-05T00:00:00.000Z	2076389	22
...	...	...	...
192	2020-07-11T00:00:00.000Z	11141339	85
193	2020-07-12T00:00:00.000Z	11196668	84
194	2020-07-13T00:00:00.000Z	11312825	83
195	2020-07-14T00:00:00.000Z	11333020	84
196	2020-07-15T00:00:00.000Z	10820104	80

197 rows × 3 columns

Displaying this data in a graph¶

So here Im to show you how to display the graph in a notebook but first I'm going to show you how to export the data as a CSV so you can look at the data in Excel or Google Sheets.

To export the DataFrame to your desktop as a CSV add .to_csv('~/Desktop/chinese_floating_storage.csv') to the DataFrame in a cell.

Like this:

In [16]:

df_fs.to_csv('~/Desktop/chinese_floating_storage.csv')

Now if you look on your desktop there should be a file called chinese_floating_storage.csv, and you'll be able to open this file in excel.

Using a python library called matplotlib that we imported at the top of this notebook you can also display the results of the query like this:

In [17]:

# rename columns
df_fs = df_fs.rename(columns={'key': 'date',
                              'value': 't',
                              'count': 'number_of_cargo_movements'})

# remove time zone from timestamp
df_fs['date'] = pd.to_datetime(df_fs['date']).dt.tz_localize(None)

In [18]:

floating_storage = df_fs.set_index('date')['t'] / 1000
floating_storage.plot(title='Chinese crude oil floating storage', grid=True)
plt.xlabel('date')
plt.ylabel('kt');