Module 6¶

Video 28: Aggregating Cargo Movements Data¶

Python for the Energy Industry

The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson.

Cargo Movements documentation

In [1]:

# initial imports
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
import vortexasdk as v

We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks.

In [2]:

# datetimes to access last 7 weeks of data
now = datetime.utcnow()
seven_weeks_ago = now - relativedelta(weeks=7)

In [3]:

# Find US ID
us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]
assert len(us) == 1

2020-12-10 19:31:56,428 vortexasdk.client — WARNING — You are using vortexasdk version 0.26.0, however version 0.27.1 is available.
You should consider upgrading via the 'pip install vortexasdk --upgrade' command.

In [4]:

# Find crude ID
crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']
assert len(crude) == 1

In [5]:

# Columns to pull out, and shortened names
required_columns = [
    # primary vessel info
    "vessels.0.name",
    "vessels.0.vessel_class",
    # product information and quantity
    "product.group.label",
    "product.category.label",
    "quantity",
    "status",
    # loading info
    "events.cargo_port_load_event.0.location.port.label",
    "events.cargo_port_load_event.0.end_timestamp",
    # unloading info
    "events.cargo_port_unload_event.0.location.port.label",
    "events.cargo_port_unload_event.0.location.country.label",
    "events.cargo_port_unload_event.0.end_timestamp",
]

new_labels = [
    "vessel_name",
    "vessel_class",
    "product_group",
    "product_category",
    "quantity",
    "status",
    "loading_port",
    "loading_finish",
    "unloading_port",
    "unloading_country",
    "unloading_finish",
]

relabel = dict(zip(required_columns,new_labels))

Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame.

In [6]:

cms = v.CargoMovements().search(
        filter_activity = 'loading_end',
        filter_origins = us,
        exclude_destinations = us,
        filter_products = crude,
        filter_time_min = seven_weeks_ago,
        filter_time_max = now,
        cm_unit = 'b'
    ).to_df(columns=required_columns).rename(relabel,axis=1)

In [7]:

# How many associated movements?
len(cms)

Out[7]:

We add a new column to the DataFrame which gets the week of the year in which loading was completed:

In [8]:

cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week

We then use groupby to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week.

In [9]:

weekly_quantity = cms.groupby('loading_week').sum()
weekly_quantity

Out[9]:

	quantity
loading_week
43	10466025
44	15567944
45	16264815
46	18324452
47	17003930
48	18275876
49	15558908
50	11001331

We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:

In [10]:

quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()
quantity_by_category

Out[10]:

	loading_week	product_category	quantity
0	43	Light-Sweet	7775458
1	43	Medium-Sour	2690567
2	44	Light-Sour	156307
3	44	Light-Sweet	15411637
4	45	Light-Sweet	12888609
5	45	Medium-Sour	2973678
6	46	Light-Sweet	17338331
7	46	Medium-Sour	986121
8	47	Light-Sour	1121194
9	47	Light-Sweet	15882736
10	48	Light-Sweet	17518824
11	48	Medium-Sour	757052
12	49	Light-Sweet	13420357
13	49	Medium-Sour	2138551
14	50	Light-Sweet	10015210
15	50	Medium-Sour	986121

This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:

In [11]:

quantity_by_category = quantity_by_category.pivot(index = 'loading_week', 
                                                  columns = 'product_category',
                                                  values = 'quantity')
quantity_by_category

Out[11]:

product_category	Light-Sour	Light-Sweet	Medium-Sour
loading_week
43	NaN	7775458.0	2690567.0
44	156307.0	15411637.0	NaN
45	NaN	12888609.0	2973678.0
46	NaN	17338331.0	986121.0
47	1121194.0	15882736.0	NaN
48	NaN	17518824.0	757052.0
49	NaN	13420357.0	2138551.0
50	NaN	10015210.0	986121.0

This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes.

In [12]:

quantity_by_category = quantity_by_category.fillna(0)
quantity_by_category

Out[12]:

product_category	Light-Sour	Light-Sweet	Medium-Sour
loading_week
43	0.0	7775458.0	2690567.0
44	156307.0	15411637.0	0.0
45	0.0	12888609.0	2973678.0
46	0.0	17338331.0	986121.0
47	1121194.0	15882736.0	0.0
48	0.0	17518824.0	757052.0
49	0.0	13420357.0	2138551.0
50	0.0	10015210.0	986121.0

Another way of breaking down the exports data is by receiving country.

In [13]:

quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]
quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)

In [14]:

quantity_by_destination.head()

Out[14]:

	quantity
unloading_country
India	17764529
China	17334013
Canada	13019738
Netherlands	10866913
United Kingdom	9787720

We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category.

In [15]:

# grab the top 10
top_destination_countries = quantity_by_destination.head(10)

# grab the bottom, sum their quantities and put into a DataFrame
rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])
rest.loc['Other'] = quantity_by_destination[10:].sum().values

#  join the two DataFrames
top_destination_countries = pd.concat([top_destination_countries, rest])

Finally, we can add a new column containing the percentage of total exports represented by exports to that country:

In [16]:

top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)
top_destination_countries

Out[16]:

	quantity	%
India	17764529	15.55
China	17334013	15.17
Canada	13019738	11.40
Netherlands	10866913	9.51
United Kingdom	9787720	8.57
France	5914144	5.18
Germany	5415834	4.74
South Korea	4871768	4.26
Italy	4582142	4.01
Taiwan	3728685	3.26
Other	20950574	18.34

Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis

Exercise¶

Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks.

In [ ]: