Module 7¶

Video 30: Working with Aggregate Flows¶

Python for the Energy Industry

As we've seen, the CargoMovements endpoint gives access to data at the level of individual cargos. In the previous two lessons, we looked at how we would aggregate this data. The Vortexa SDK also conveniently offers a way of directly accessing aggregated flows data: the CargoTimeSeries endpoint.

Cargo Time Series documentation.

So how does this work?

In [1]:

# initial imports
import pandas as pd
import numpy as np
from datetime import datetime,date
from dateutil.relativedelta import relativedelta
import vortexasdk as v

There are a couple of important parameters to define when using time series data:

In [2]:

# The cargo unit for the time series (barrels)
TS_UNIT = 'b'

# The granularity of the time series
TS_FREQ = 'week'

As before, we also need to define the geography and products of interest. Here we will once again consider crude exports from the US. So we get the corresponding IDs:

In [3]:

# datetimes to access last 7 weeks of data
now = datetime.utcnow()
seven_weeks_ago = now - relativedelta(weeks=7)

# Find US ID
us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]
assert len(us) == 1

# Find crude ID
crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']
assert len(crude) == 1

We then make a a search call to the CargoTimeSeries endpoint, specifying our date range, geography and product IDs as usual. We also specify the unit (barrels) and granularity (weekly) of the data.

In [4]:

df_exports = v.CargoTimeSeries().search( 
            filter_activity = 'loading_end',
            filter_origins = us,
            filter_products = crude, 
            # Measure in barrels
            timeseries_unit = TS_UNIT,
            # Look at weekly imports
            timeseries_frequency = TS_FREQ,
            # Set the date range
            filter_time_min = seven_weeks_ago,
            filter_time_max = now).to_df()

df_exports

Out[4]:

	key	value	count
0	2020-12-21 00:00:00+00:00	28153014	55
1	2020-12-28 00:00:00+00:00	28800130	48
2	2021-01-04 00:00:00+00:00	17902289	26
3	2021-01-11 00:00:00+00:00	24413069	45
4	2021-01-18 00:00:00+00:00	19569612	31
5	2021-01-25 00:00:00+00:00	29752386	47
6	2021-02-01 00:00:00+00:00	22664959	35
7	2021-02-08 00:00:00+00:00	986121	1

To make the dataframe more readable, we rename the key and value columns, and convert the datetime object to date only.

In [5]:

df_exports['key'] = pd.to_datetime(df_exports['key']).dt.date
df_exports = df_exports.rename(columns = {'key': 'date', 'value': 'barrels'})
df_exports = df_exports.set_index('date')

df_exports

Out[5]:

	barrels	count
date
2020-12-21	28153014	55
2020-12-28	28800130	48
2021-01-04	17902289	26
2021-01-11	24413069	45
2021-01-18	19569612	31
2021-01-25	29752386	47
2021-02-01	22664959	35
2021-02-08	986121	1

It's possible to access a week's data from this DataFrame by indexing it with the date of the start of the week:

In [13]:

date_to_locate = df_exports.index[0]

In [15]:

# one week's data
print(df_exports.loc[date_to_locate])

barrels    28153014
count            55
Name: 2020-12-21, dtype: int64

Exercise¶

Use the CargoTimeSeries endpoint to pull the last 2 weeks' US crude exports data, but with a daily frequency.

In [ ]: