Module 5¶

Video 23: Filtering by Geography¶

Python for the Energy Industry

In previous lessons, we have seen that the Geography endpoint of the Vortexa SDK can be used to search for, and get ID numbers for different types of location - countries, shipping regions, ports, etc. These IDs can be used to filter searches based on e.g. origin and destination.

Geographies endpoint documentation.

In [1]:

# Some initial imports / settings
from datetime import datetime
from dateutil.relativedelta import relativedelta
import vortexasdk as v

now = datetime.utcnow()
one_month_ago = now - relativedelta(months=1)

If we use the Geographies endpoint search function to search for a high-level term like 'china', we will likely get a scattering of results at different levels that match that search term:

In [2]:

china_search = v.Geographies().search("china")
china_search.to_df(columns=['name','layer'])

2020-12-10 16:24:14,147 vortexasdk.client — WARNING — You are using vortexasdk version 0.26.0, however version 0.27.1 is available.
You should consider upgrading via the 'pip install vortexasdk --upgrade' command.

Out[2]:

	name	layer
0	China	[country]
1	South China	[shipping_region]
2	North China	[shipping_region]
3	China (excl. HK & Macau)	[shipping_region]
4	China Steel Chemical	[terminal]
5	China Energy Services Ningbo	[terminal]
6	East China Sea STS	[sts_zone]
7	Multipurpose (China Merchants) Terminal	[terminal]
8	China Union, Freeport Of Monrovia	[terminal]
9	China Resources Chemical Holding Terminal	[terminal]
10	China General Terminal Distribution Corp	[terminal]
11	China Bay(Ioc Terminal) (Ex-Trincomalee)	[terminal]
12	Wenzhou China Petroleum Fuel Bitumen Co., Ltd.	[terminal]

Having done this search, we can of course pick the one we want and grab its ID.

In [3]:

china_ID = china_search[0]['id']
print('China ID:',china_ID)

China ID: 934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2

There are a couple of ways of attempting to get the right ID directly, without having to inspect the search results. One of these is to require that the geography name is a direct match to our search query:

In [4]:

china_search = v.Geographies().search("China",exact_term_match=True)
assert len(china_search) == 1

china_ID = china_search[0]['id']
print('China ID:',china_ID)

China ID: 934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2

Here, the line assert len(china_search) == 1 is used to ensure that we have only got one matching result. This line will throw an error if this is not the case.

We can also do this without requiring an exact match, by specifying the particular geographic layer:

In [5]:

china_geo = [g for g in v.Geographies().search("china") if 'country' in g['layer']]
assert len(china_geo) == 1

china_ID = china_search[0]['id']
print('China ID:',china_ID)

China ID: 934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2

We can see what our assert statement does if our search yields multiple results:

In [6]:

korea_geo = [g for g in v.Geographies().search("korea") if 'country' in g['layer']]
# Uncomment and run the following line     
#assert len(korea_geo) == 1

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-88c81d8c0f4b> in <module>
      1 korea_geo = [g for g in v.Geographies().search("korea") if 'country' in g['layer']]
----> 2 assert len(korea_geo) == 1

AssertionError:

Filtering Cargo Movements¶

We will now look at filtering Cargo Movements searches based on geographies. Let's say we want so find all cargoes that were loading in the last month, and that are destined for China:

In [7]:

cm_china_query = v.CargoMovements().search(
    filter_activity="loading_state",
    filter_time_min=one_month_ago,
    filter_time_max=now,
    filter_destinations=china_ID)

Loading from API: 2500it [00:02, 966.33it/s]

We'll convert this into a DataFrame. In a previous lesson, we saw that we can specify the particular columns that we want to put in our Cargo Movements DataFrame. We also saw that these columns can be given shorter, less cumbersome labels. We make two lists, with the full labels and our chosen new labels in corresponding order:

In [8]:

required_columns = [
    # Primary vessel details
    "vessels.0.name",
    "vessels.0.vessel_class",
    # Product information
    "product.group.label",
    "product.grade.label",
    "quantity",
    # Vessel status
    "status",
    # Loading port and time
    "events.cargo_port_load_event.0.location.port.label",
    "events.cargo_port_load_event.0.end_timestamp",
    # Discharge port and time
    "events.cargo_port_unload_event.0.location.port.label",
    "events.cargo_port_unload_event.0.end_timestamp",
]

new_labels = [
    "vessel_name",
    "vessel_class",
    "product_group",
    "product_grade",
    "quantity",
    "status",
    "loading_port",
    "loading_finish",
    "unloading_port",
    "unloading_finish",
]

We can then make a dictionary for renaming, by zipping these two lists together:

In [9]:

relabel = dict(zip(required_columns,new_labels))

cm_china_query.to_df(columns=required_columns).rename(relabel,axis=1)

Out[9]:

	vessel_name	vessel_class	product_group	product_grade	quantity	status	loading_port	loading_finish	unloading_port	unloading_finish
0	ATLANTICOS	handysize	Dirty Petroleum Products	NaN	167491	unloaded_state	Slavyanka [RU]	2020-11-19 10:59:27+00:00	Ningbo (Beilun) [CN]	2020-12-03 02:16:24+00:00
1	SW SOUTHPORT I	handymax	Clean Petroleum Products	NaN	273706	unloaded_state	Yeosu (Yosu), Gwangyang [KR]	2020-11-25 15:27:26+00:00	Dongguan, Humen [CN]	2020-12-03 04:13:26+00:00
2	NEW TINOS	vlcc_plus	Dirty Petroleum Products	NaN	670951	unloaded_state	NaN	NaT	Hong Kong [CN]	2020-11-25 19:11:45+00:00
3	HANG HAI YOU 17	general_purpose	Crude/Condensates	NaN	147432	unloaded_state	Dongjiakou [CN]	2020-12-08 04:04:06+00:00	Rizhao - Lanshan Area [CN]	2020-12-09 23:47:47+00:00
4	QUANLINLIANGYOU 5	tiny_tanker	Clean Petroleum Products	NaN	35952	unloaded_state	Taixing [CN]	2020-11-21 03:39:28+00:00	Nantong [CN]	2020-11-22 05:44:38+00:00
...	...	...	...	...	...	...	...	...	...	...
2294	XIN DE116	tiny_tanker	Clean Petroleum Products	NaN	35398	unloaded_state	Dalian [CN]	2020-11-29 19:46:12+00:00	Shanghai [CN]	2020-12-04 05:44:43+00:00
2295	SC TIANJIN	general_purpose	Clean Petroleum Products	Monoethylene Glycol (MEG)	14774	unloaded_state	Mai Liao - Taiwan [TW]	2020-12-01 11:38:16+00:00	Jiangyin [CN]	2020-12-10 05:46:48+00:00
2296	TIAN EN 1001	tiny_tanker	Clean Petroleum Products	NaN	15850	unloaded_state	Nanjing [CN]	2020-12-03 21:04:10+00:00	Zhenjiang [CN]	2020-12-05 08:43:42+00:00
2297	SC DALIAN	general_purpose	Clean Petroleum Products	NaN	38952	unloaded_state	Ulsan, Onsan [KR]	2020-11-25 23:36:28+00:00	Jingjiang [CN]	2020-11-30 14:27:25+00:00
2298	BLUE SPRING	handymax	Clean Petroleum Products	NaN	175340	unloaded_state	NaN	2020-11-16 15:52:16+00:00	Nantong [CN]	2020-11-26 08:20:41+00:00

2299 rows × 10 columns

We could choose to filter out intranational transfers by also using the ID for China with the exclude_origins argument.

In [10]:

cm_china_query = v.CargoMovements().search(
    filter_activity="loading_state",
    filter_time_min=one_month_ago,
    filter_time_max=now,
    filter_destinations=china_ID,
    exclude_origins=china_ID)

cm_china_query.to_df(columns=required_columns).rename(relabel,axis=1)

Loading from API: 1500it [00:00, 1762.69it/s]

Out[10]:

	vessel_name	vessel_class	product_group	product_grade	quantity	status	loading_port	loading_finish	unloading_port	unloading_finish
0	SC XIAMEN	general_purpose	Clean Petroleum Products	NaN	88906	unloaded_state	Ulsan, Onsan [KR]	2020-11-25 23:29:57+00:00	Pinghu, Jiaxing [CN]	2020-12-09 01:18:41+00:00
1	HEUNG-A PIONEER	general_purpose	Clean Petroleum Products	NaN	76018	transiting_state	Singapore [SG]	2020-12-05 10:00:50+00:00	Zhangjiagang [CN]	NaT
2	STX IRIS	general_purpose	Clean Petroleum Products	NaN	97850	unloaded_state	Daesan [KR]	2020-11-12 21:04:00+00:00	Pinghu, Jiaxing [CN]	2020-11-21 06:31:10+00:00
3	SEA CRYSTAL	general_purpose	Clean Petroleum Products	NaN	102649	unloaded_state	Incheon [KR]	2020-11-27 22:34:18+00:00	Jiangyin [CN]	2020-12-03 05:40:32+00:00
4	JS NANJING	tiny_tanker	Dirty Petroleum Products	NaN	15460	unloaded_state	Yeosu (Yosu), Gwangyang [KR]	2020-11-19 00:00:45+00:00	Nanjing [CN]	2020-11-25 18:12:21+00:00
...	...	...	...	...	...	...	...	...	...	...
1053	NAVE ATROPOS	panamax	Clean Petroleum Products	NaN	295545	transiting_state	Singapore [SG]	2020-11-21 15:05:26+00:00	Dalian [CN]	NaT
1054	ASIA DAWN	vlcc_plus	Crude/Condensates	Oman Blend	1931245	transiting_state	Mina Al Fahal, Muscat [OM]	2020-11-27 21:14:27+00:00	Rizhao - Lanshan Area [CN]	NaT
1055	SC TIANJIN	general_purpose	Clean Petroleum Products	Monoethylene Glycol (MEG)	14774	unloaded_state	Mai Liao - Taiwan [TW]	2020-12-01 11:38:16+00:00	Jiangyin [CN]	2020-12-10 05:46:48+00:00
1056	SC DALIAN	general_purpose	Clean Petroleum Products	NaN	38952	unloaded_state	Ulsan, Onsan [KR]	2020-11-25 23:36:28+00:00	Jingjiang [CN]	2020-11-30 14:27:25+00:00
1057	BLUE SPRING	handymax	Clean Petroleum Products	NaN	175340	unloaded_state	NaN	2020-11-16 15:52:16+00:00	Nantong [CN]	2020-11-26 08:20:41+00:00

1058 rows × 10 columns

Note: there is another way of excluding intranational transfers: use the disable_geographic_exclusion_rules argument, and set this to True.

The full list of filters on Geography is:

filter_destinations
filter_origins
filter_storage_locations
filter_ship_to_ship_locations
filter_waypoints
exclude_origins
exclude_destinations

Each of these will also accept a list of IDs, rather than just a single ID.

In [11]:

south_korea_geo = v.Geographies().search("South Korea",exact_term_match=True)
assert len(south_korea_geo) == 1

japan_geo = v.Geographies().search("Japan",exact_term_match=True)
assert len(japan_geo) == 1

south_korea_ID = south_korea_geo[0]['id']
japan_ID = japan_geo[0]['id']

cm_query = v.CargoMovements().search(
    filter_activity="loading_state",
    filter_time_min=one_month_ago,
    filter_time_max=now,
    filter_destinations=china_ID,
    filter_origins=[south_korea_ID,japan_ID])

cm_query.to_df(columns=required_columns).rename(relabel,axis=1)

Out[11]:

	vessel_name	vessel_class	product_group	product_grade	quantity	status	loading_port	loading_finish	unloading_port	unloading_finish
0	SC XIAMEN	general_purpose	Clean Petroleum Products	NaN	88906	unloaded_state	Ulsan, Onsan [KR]	2020-11-25 23:29:57+00:00	Pinghu, Jiaxing [CN]	2020-12-09 01:18:41+00:00
1	STX IRIS	general_purpose	Clean Petroleum Products	NaN	97850	unloaded_state	Daesan [KR]	2020-11-12 21:04:00+00:00	Pinghu, Jiaxing [CN]	2020-11-21 06:31:10+00:00
2	SEA CRYSTAL	general_purpose	Clean Petroleum Products	NaN	102649	unloaded_state	Incheon [KR]	2020-11-27 22:34:18+00:00	Jiangyin [CN]	2020-12-03 05:40:32+00:00
3	JS NANJING	tiny_tanker	Dirty Petroleum Products	NaN	15460	unloaded_state	Yeosu (Yosu), Gwangyang [KR]	2020-11-19 00:00:45+00:00	Nanjing [CN]	2020-11-25 18:12:21+00:00
4	GAS EMERALD	sgc	Clean Petroleum Products	NaN	7610	unloaded_state	Ulsan, Onsan [KR]	2020-11-17 17:27:54+00:00	Zhapu [CN]	2020-11-21 06:42:21+00:00
...	...	...	...	...	...	...	...	...	...	...
366	HUA HANG HE CHANG	tiny_tanker	Dirty Petroleum Products	NaN	34310	unloaded_state	Yeosu (Yosu), Gwangyang [KR]	2020-11-23 07:50:40+00:00	Nanjing [CN]	2020-11-30 06:21:44+00:00
367	CRANE MIMOSA	tiny_tanker	Clean Petroleum Products	NaN	48745	unloaded_state	Anegasaki-Chiba [JP]	2020-11-25 21:47:22+00:00	Zhangjiagang [CN]	2020-12-02 00:44:23+00:00
368	NEW SILVER	general_purpose	Clean Petroleum Products	NaN	18831	unloaded_state	Ulsan, Onsan [KR]	2020-11-19 01:29:57+00:00	Dalian [CN]	2020-11-26 08:01:41+00:00
369	WOOSHIN ACE	tiny_tanker	Clean Petroleum Products	NaN	21328	unloaded_state	Yeosu (Yosu), Gwangyang [KR]	2020-11-23 17:53:11+00:00	Nantong [CN]	2020-11-26 23:46:12+00:00
370	SC DALIAN	general_purpose	Clean Petroleum Products	NaN	38952	unloaded_state	Ulsan, Onsan [KR]	2020-11-25 23:36:28+00:00	Jingjiang [CN]	2020-11-30 14:27:25+00:00

371 rows × 10 columns

Exercise¶

Assemble a DataFrame of all Cargo Movements that were loading in the last month and that are destined for the United Kingdom. Make a pie chart of the country of origin (think about changing columns in the DataFrame to make this information easier to access).

In [ ]: