# initial imports
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.relativedelta import relativedelta
import vortexasdk as v
We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks.
# datetimes to access last 7 weeks of data
now = datetime.utcnow()
seven_weeks_ago = now - relativedelta(weeks=7)
# Find US ID
us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]
assert len(us) == 1
2020-12-10 19:31:56,428 vortexasdk.client — WARNING — You are using vortexasdk version 0.26.0, however version 0.27.1 is available. You should consider upgrading via the 'pip install vortexasdk --upgrade' command.
# Find crude ID
crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']
assert len(crude) == 1
# Columns to pull out, and shortened names
required_columns = [
# primary vessel info
"vessels.0.name",
"vessels.0.vessel_class",
# product information and quantity
"product.group.label",
"product.category.label",
"quantity",
"status",
# loading info
"events.cargo_port_load_event.0.location.port.label",
"events.cargo_port_load_event.0.end_timestamp",
# unloading info
"events.cargo_port_unload_event.0.location.port.label",
"events.cargo_port_unload_event.0.location.country.label",
"events.cargo_port_unload_event.0.end_timestamp",
]
new_labels = [
"vessel_name",
"vessel_class",
"product_group",
"product_category",
"quantity",
"status",
"loading_port",
"loading_finish",
"unloading_port",
"unloading_country",
"unloading_finish",
]
relabel = dict(zip(required_columns,new_labels))
Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame.
cms = v.CargoMovements().search(
filter_activity = 'loading_end',
filter_origins = us,
exclude_destinations = us,
filter_products = crude,
filter_time_min = seven_weeks_ago,
filter_time_max = now,
cm_unit = 'b'
).to_df(columns=required_columns).rename(relabel,axis=1)
# How many associated movements?
len(cms)
180
We add a new column to the DataFrame which gets the week of the year in which loading was completed:
cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week
We then use groupby
to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week.
weekly_quantity = cms.groupby('loading_week').sum()
weekly_quantity
quantity | |
---|---|
loading_week | |
43 | 10466025 |
44 | 15567944 |
45 | 16264815 |
46 | 18324452 |
47 | 17003930 |
48 | 18275876 |
49 | 15558908 |
50 | 11001331 |
We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:
quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()
quantity_by_category
loading_week | product_category | quantity | |
---|---|---|---|
0 | 43 | Light-Sweet | 7775458 |
1 | 43 | Medium-Sour | 2690567 |
2 | 44 | Light-Sour | 156307 |
3 | 44 | Light-Sweet | 15411637 |
4 | 45 | Light-Sweet | 12888609 |
5 | 45 | Medium-Sour | 2973678 |
6 | 46 | Light-Sweet | 17338331 |
7 | 46 | Medium-Sour | 986121 |
8 | 47 | Light-Sour | 1121194 |
9 | 47 | Light-Sweet | 15882736 |
10 | 48 | Light-Sweet | 17518824 |
11 | 48 | Medium-Sour | 757052 |
12 | 49 | Light-Sweet | 13420357 |
13 | 49 | Medium-Sour | 2138551 |
14 | 50 | Light-Sweet | 10015210 |
15 | 50 | Medium-Sour | 986121 |
This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:
quantity_by_category = quantity_by_category.pivot(index = 'loading_week',
columns = 'product_category',
values = 'quantity')
quantity_by_category
product_category | Light-Sour | Light-Sweet | Medium-Sour |
---|---|---|---|
loading_week | |||
43 | NaN | 7775458.0 | 2690567.0 |
44 | 156307.0 | 15411637.0 | NaN |
45 | NaN | 12888609.0 | 2973678.0 |
46 | NaN | 17338331.0 | 986121.0 |
47 | 1121194.0 | 15882736.0 | NaN |
48 | NaN | 17518824.0 | 757052.0 |
49 | NaN | 13420357.0 | 2138551.0 |
50 | NaN | 10015210.0 | 986121.0 |
This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes.
quantity_by_category = quantity_by_category.fillna(0)
quantity_by_category
product_category | Light-Sour | Light-Sweet | Medium-Sour |
---|---|---|---|
loading_week | |||
43 | 0.0 | 7775458.0 | 2690567.0 |
44 | 156307.0 | 15411637.0 | 0.0 |
45 | 0.0 | 12888609.0 | 2973678.0 |
46 | 0.0 | 17338331.0 | 986121.0 |
47 | 1121194.0 | 15882736.0 | 0.0 |
48 | 0.0 | 17518824.0 | 757052.0 |
49 | 0.0 | 13420357.0 | 2138551.0 |
50 | 0.0 | 10015210.0 | 986121.0 |
Another way of breaking down the exports data is by receiving country.
quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]
quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)
quantity_by_destination.head()
quantity | |
---|---|
unloading_country | |
India | 17764529 |
China | 17334013 |
Canada | 13019738 |
Netherlands | 10866913 |
United Kingdom | 9787720 |
We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category.
# grab the top 10
top_destination_countries = quantity_by_destination.head(10)
# grab the bottom, sum their quantities and put into a DataFrame
rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])
rest.loc['Other'] = quantity_by_destination[10:].sum().values
# join the two DataFrames
top_destination_countries = pd.concat([top_destination_countries, rest])
Finally, we can add a new column containing the percentage of total exports represented by exports to that country:
top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)
top_destination_countries
quantity | % | |
---|---|---|
India | 17764529 | 15.55 |
China | 17334013 | 15.17 |
Canada | 13019738 | 11.40 |
Netherlands | 10866913 | 9.51 |
United Kingdom | 9787720 | 8.57 |
France | 5914144 | 5.18 |
Germany | 5415834 | 4.74 |
South Korea | 4871768 | 4.26 |
Italy | 4582142 | 4.01 |
Taiwan | 3728685 | 3.26 |
Other | 20950574 | 18.34 |
Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis
Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks.