Module 6¶

Video 28: Aggregating Cargo Movements Data¶

Python for the Energy Industry

The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson.

Cargo Movements documentation

In [1]:

# initial imports
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
import vortexasdk as v

We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks.

In [2]:

# datetimes to access last 7 weeks of data
now = datetime.utcnow()
seven_weeks_ago = now - relativedelta(weeks=7)

In [3]:

# Find US ID
us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]
assert len(us) == 1

In [4]:

# Find crude ID
crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']
assert len(crude) == 1

In [5]:

# Columns to pull out, and shortened names
required_columns = [
    # primary vessel info
    "vessels.0.name",
    "vessels.0.vessel_class",
    # product information and quantity
    "product.group.label",
    "product.category.label",
    "quantity",
    "status",
    # loading info
    "events.cargo_port_load_event.0.location.port.label",
    "events.cargo_port_load_event.0.end_timestamp",
    # unloading info
    "events.cargo_port_unload_event.0.location.port.label",
    "events.cargo_port_unload_event.0.location.country.label",
    "events.cargo_port_unload_event.0.end_timestamp",
]

new_labels = [
    "vessel_name",
    "vessel_class",
    "product_group",
    "product_category",
    "quantity",
    "status",
    "loading_port",
    "loading_finish",
    "unloading_port",
    "unloading_country",
    "unloading_finish",
]

relabel = dict(zip(required_columns,new_labels))

Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame.

In [6]:

cms = v.CargoMovements().search(
        filter_activity = 'loading_end',
        filter_origins = us,
        exclude_destinations = us,
        filter_products = crude,
        filter_time_min = seven_weeks_ago,
        filter_time_max = now,
        cm_unit = 'b'
    ).to_df(columns=required_columns).rename(relabel,axis=1)

In [7]:

# How many associated movements?
len(cms)

Out[7]:

We add a new column to the DataFrame which gets the week of the year in which loading was completed:

In [8]:

cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week

We then use groupby to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week.

In [9]:

weekly_quantity = cms.groupby('loading_week').sum()
weekly_quantity

Out[9]:

	quantity
loading_week
21	16203392
22	19765442
23	21530598
24	21794264
25	25614818
26	19651398
27	24615334
28	5623297

We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:

In [10]:

quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()
quantity_by_category

Out[10]:

	loading_week	product_category	quantity
0	21	Light-Sweet	14457314
1	21	Medium-Sour	1557064
2	22	Light-Sweet	18779321
3	22	Medium-Sour	986121
4	23	Heavy-Sour	1584078
5	23	Light-Sweet	17238857
6	23	Medium-Sour	1959945
7	24	Heavy-Sour	483880
8	24	Light-Sour	1443458
9	24	Light-Sweet	19866926
10	25	Heavy-Sour	298671
11	25	Light-Sweet	24730561
12	25	Medium-Sour	357009
13	26	Light-Sour	815856
14	26	Light-Sweet	14701277
15	26	Medium-Sour	3447476
16	27	Light-Sweet	23007185
17	27	Medium-Sour	1275061
18	28	Light-Sweet	5623297

This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:

In [11]:

quantity_by_category = quantity_by_category.pivot(index = 'loading_week', 
                                                  columns = 'product_category',
                                                  values = 'quantity')
quantity_by_category

Out[11]:

product_category	Heavy-Sour	Light-Sour	Light-Sweet	Medium-Sour
loading_week
21	NaN	NaN	14457314.0	1557064.0
22	NaN	NaN	18779321.0	986121.0
23	1584078.0	NaN	17238857.0	1959945.0
24	483880.0	1443458.0	19866926.0	NaN
25	298671.0	NaN	24730561.0	357009.0
26	NaN	815856.0	14701277.0	3447476.0
27	NaN	NaN	23007185.0	1275061.0
28	NaN	NaN	5623297.0	NaN

This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes.

In [12]:

quantity_by_category = quantity_by_category.fillna(0)
quantity_by_category

Out[12]:

product_category	Heavy-Sour	Light-Sour	Light-Sweet	Medium-Sour
loading_week
21	0.0	0.0	14457314.0	1557064.0
22	0.0	0.0	18779321.0	986121.0
23	1584078.0	0.0	17238857.0	1959945.0
24	483880.0	1443458.0	19866926.0	0.0
25	298671.0	0.0	24730561.0	357009.0
26	0.0	815856.0	14701277.0	3447476.0
27	0.0	0.0	23007185.0	1275061.0
28	0.0	0.0	5623297.0	0.0

Another way of breaking down the exports data is by receiving country.

In [13]:

quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]
quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)

In [14]:

quantity_by_destination.head()

Out[14]:

	quantity
unloading_country
South Korea	19482797
India	15210800
Canada	11880161
Netherlands	11127371
United Kingdom	10571957

We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category.

In [15]:

# grab the top 10
top_destination_countries = quantity_by_destination.head(10)

# grab the bottom, sum their quantities and put into a DataFrame
rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])
rest.loc['Other'] = quantity_by_destination[10:].sum().values

#  join the two DataFrames
top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)

Finally, we can add a new column containing the percentage of total exports represented by exports to that country:

In [16]:

top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)
top_destination_countries

Out[16]:

	quantity	%
South Korea	19482797	14.79
India	15210800	11.54
Canada	11880161	9.02
Netherlands	11127371	8.44
United Kingdom	10571957	8.02
China	8629115	6.55
Italy	6716011	5.10
France	6286669	4.77
Taiwan	6243591	4.74
Germany	4338556	3.29
Other	31278613	23.74

Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis

Exercise¶

Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks.

In [ ]: