# initial imports
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
import vortexasdk as v
We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks.
# datetimes to access last 7 weeks of data
now = datetime.utcnow()
seven_weeks_ago = now - relativedelta(weeks=7)
# Find US ID
us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]
assert len(us) == 1
# Find crude ID
crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']
assert len(crude) == 1
# Columns to pull out, and shortened names
required_columns = [
# primary vessel info
"vessels.0.name",
"vessels.0.vessel_class",
# product information and quantity
"product.group.label",
"product.category.label",
"quantity",
"status",
# loading info
"events.cargo_port_load_event.0.location.port.label",
"events.cargo_port_load_event.0.end_timestamp",
# unloading info
"events.cargo_port_unload_event.0.location.port.label",
"events.cargo_port_unload_event.0.location.country.label",
"events.cargo_port_unload_event.0.end_timestamp",
]
new_labels = [
"vessel_name",
"vessel_class",
"product_group",
"product_category",
"quantity",
"status",
"loading_port",
"loading_finish",
"unloading_port",
"unloading_country",
"unloading_finish",
]
relabel = dict(zip(required_columns,new_labels))
Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame.
cms = v.CargoMovements().search(
filter_activity = 'loading_end',
filter_origins = us,
exclude_destinations = us,
filter_products = crude,
filter_time_min = seven_weeks_ago,
filter_time_max = now,
cm_unit = 'b'
).to_df(columns=required_columns).rename(relabel,axis=1)
# How many associated movements?
len(cms)
228
We add a new column to the DataFrame which gets the week of the year in which loading was completed:
cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week
We then use groupby
to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week.
weekly_quantity = cms.groupby('loading_week').sum()
weekly_quantity
quantity | |
---|---|
loading_week | |
21 | 16203392 |
22 | 19765442 |
23 | 21530598 |
24 | 21794264 |
25 | 25614818 |
26 | 19651398 |
27 | 24615334 |
28 | 5623297 |
We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:
quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()
quantity_by_category
loading_week | product_category | quantity | |
---|---|---|---|
0 | 21 | Light-Sweet | 14457314 |
1 | 21 | Medium-Sour | 1557064 |
2 | 22 | Light-Sweet | 18779321 |
3 | 22 | Medium-Sour | 986121 |
4 | 23 | Heavy-Sour | 1584078 |
5 | 23 | Light-Sweet | 17238857 |
6 | 23 | Medium-Sour | 1959945 |
7 | 24 | Heavy-Sour | 483880 |
8 | 24 | Light-Sour | 1443458 |
9 | 24 | Light-Sweet | 19866926 |
10 | 25 | Heavy-Sour | 298671 |
11 | 25 | Light-Sweet | 24730561 |
12 | 25 | Medium-Sour | 357009 |
13 | 26 | Light-Sour | 815856 |
14 | 26 | Light-Sweet | 14701277 |
15 | 26 | Medium-Sour | 3447476 |
16 | 27 | Light-Sweet | 23007185 |
17 | 27 | Medium-Sour | 1275061 |
18 | 28 | Light-Sweet | 5623297 |
This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:
quantity_by_category = quantity_by_category.pivot(index = 'loading_week',
columns = 'product_category',
values = 'quantity')
quantity_by_category
product_category | Heavy-Sour | Light-Sour | Light-Sweet | Medium-Sour |
---|---|---|---|---|
loading_week | ||||
21 | NaN | NaN | 14457314.0 | 1557064.0 |
22 | NaN | NaN | 18779321.0 | 986121.0 |
23 | 1584078.0 | NaN | 17238857.0 | 1959945.0 |
24 | 483880.0 | 1443458.0 | 19866926.0 | NaN |
25 | 298671.0 | NaN | 24730561.0 | 357009.0 |
26 | NaN | 815856.0 | 14701277.0 | 3447476.0 |
27 | NaN | NaN | 23007185.0 | 1275061.0 |
28 | NaN | NaN | 5623297.0 | NaN |
This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes.
quantity_by_category = quantity_by_category.fillna(0)
quantity_by_category
product_category | Heavy-Sour | Light-Sour | Light-Sweet | Medium-Sour |
---|---|---|---|---|
loading_week | ||||
21 | 0.0 | 0.0 | 14457314.0 | 1557064.0 |
22 | 0.0 | 0.0 | 18779321.0 | 986121.0 |
23 | 1584078.0 | 0.0 | 17238857.0 | 1959945.0 |
24 | 483880.0 | 1443458.0 | 19866926.0 | 0.0 |
25 | 298671.0 | 0.0 | 24730561.0 | 357009.0 |
26 | 0.0 | 815856.0 | 14701277.0 | 3447476.0 |
27 | 0.0 | 0.0 | 23007185.0 | 1275061.0 |
28 | 0.0 | 0.0 | 5623297.0 | 0.0 |
Another way of breaking down the exports data is by receiving country.
quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]
quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)
quantity_by_destination.head()
quantity | |
---|---|
unloading_country | |
South Korea | 19482797 |
India | 15210800 |
Canada | 11880161 |
Netherlands | 11127371 |
United Kingdom | 10571957 |
We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category.
# grab the top 10
top_destination_countries = quantity_by_destination.head(10)
# grab the bottom, sum their quantities and put into a DataFrame
rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])
rest.loc['Other'] = quantity_by_destination[10:].sum().values
# join the two DataFrames
top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)
Finally, we can add a new column containing the percentage of total exports represented by exports to that country:
top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)
top_destination_countries
quantity | % | |
---|---|---|
South Korea | 19482797 | 14.79 |
India | 15210800 | 11.54 |
Canada | 11880161 | 9.02 |
Netherlands | 11127371 | 8.44 |
United Kingdom | 10571957 | 8.02 |
China | 8629115 | 6.55 |
Italy | 6716011 | 5.10 |
France | 6286669 | 4.77 |
Taiwan | 6243591 | 4.74 |
Germany | 4338556 | 3.29 |
Other | 31278613 | 23.74 |
Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis
Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks.