When working with Vortexa’s Python SDK, one of the foundational components you'll encounter is reference data. But what exactly is reference data, and why is it crucial for your analytics and data manipulation tasks?
Reference data, also known as master data, encompasses the core datasets that define and categorize the entities within Vortexa’s ecosystem. These datasets are used to standardize and provide context to the event data you’ll be working with. In essence, reference data acts as the backbone for ensuring consistency, accuracy, and reliability in your data operations.
1. Geographies: Detailed information about various locations, such as ports, terminals, and regions. This allows you to accurately track and analyze movements and activities related to these locations.
2. Vessels: Comprehensive data about vessels, including their names, types, sizes, and other attributes. This is essential for tracking vessel movements, understanding fleet compositions, and analyzing shipping trends.
3. Products: Information about different types of products, their classifications, and hierarchies. This helps in analyzing trade flows, market dynamics, and commodity-specific trends.
import vortexasdk as v
import pandas as pd
To start with, let's explore what you could extract from Geographies endpoint.
# To load all geographies
all_geographies = v.Geographies().load_all().to_df()
# To load all geographies with a specific type (e.g. country)
country_list = v.Geographies().search(filter_layer=['country']).to_df()
# To load all geographies with a name containing 'China'
china_list = v.Geographies().search(term='China').to_df()
2024-07-23 13:31:07,189 vortexasdk.client — WARNING — You are using vortexasdk version 0.72.5, however version 0.72.6 is available. You should consider upgrading via the 'pip install vortexasdk --upgrade' command.
all_geographies.head(5)
id | name | layer | |
---|---|---|---|
0 | 0eb7b43e3d4e62db74e187bc4eadadfb878a210201e14a... | 21st Century Shipbuilding | [terminal] |
1 | b27430b57d617a855d44f91fb70441bae69c19a3c0deb4... | 2x1 Holding Cape Midia Shipyard | [terminal] |
2 | d38a8f7bf8ed422b439ad5270be65b60b964bed9568936... | A Pobra Do Caraminal [ES] | [port] |
3 | 3ebcf6f2e43e3a8b06e9b0ae31df3d87f3fbc8d032b72d... | A and P Group Falmouth Shipyard | [terminal] |
4 | 0c8a30f40639257e5352e2c6ac52af2f93b2c6bfed7187... | A and P Group Tees Shipyard | [terminal] |
country_list.head(5)
id | name | layer | |
---|---|---|---|
0 | 1cd3c07221f9e9b3296c859d0bcd3da17ac6072bfdcc84... | Afghanistan | [country] |
1 | 5e4e7b5040b933b5a0f0d2357ed27ebec432c749b9d63a... | Albania | [country] |
2 | 87269b28eaea324d2c35e97b0ecc837ebc9a244faf94e2... | Algeria | [country] |
3 | f4435d4fffa5b2ba7a340e3a8e7d421f619d9f7832fa0c... | American Samoa | [country] |
4 | db3cb74043b8fd438087a3e0e04e3e498b78c3c9790fce... | Andorra | [country] |
china_list.head(5)
id | name | layer | |
---|---|---|---|
0 | 934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5... | China | [country] |
1 | 781cacc7033f877caa4b4106d096b74afe006a96391bf5... | South China | [alternative_region] |
2 | a63890260e29d859390fd1a23c690181afd4bd152943a0... | North China | [alternative_region] |
3 | d1d5a3d3666d6ebb65a1b53e626b07f6b8540e8048a524... | Shipyard Nansha China | [terminal] |
4 | ae0f224030f7337d0ffe5a54d290a9f0bd029f636eaf12... | China Yangfan Group | [terminal] |
This shows how you could extract the ids, which may be required from other endpoints. In addition, to extract more information about the locations such as centroids, hierarchies etc, we can do .to_df(columns = 'all) method.
china_list_enhanced = v.Geographies().search(term='China').to_df(columns = 'all')
china_list_enhanced.head(5)
id | name | layer | leaf | parent | exclusion_rule | ref_type | hierarchy | pos | aliases | tags | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5... | China | [country] | False | [{'name': 'Far East', 'layer': ['alternative_r... | [{'name': 'China', 'layer': ['country'], 'id':... | geography | [{'label': 'China', 'layer': 'country', 'id': ... | [105.4525116754, 35.4496039032] | [] | {'importProductTags': [], 'exportProductTags':... |
1 | 781cacc7033f877caa4b4106d096b74afe006a96391bf5... | South China | [alternative_region] | False | [{'name': 'China (excl. HK & Macau)', 'layer':... | [{'name': 'South China', 'layer': ['alternativ... | geography | [{'id': 'b5fafce6e20de2dc307fb7e0b89978ee91a49... | [106.3418341843, 28.1283930445] | [] | {'importProductTags': [], 'exportProductTags':... |
2 | a63890260e29d859390fd1a23c690181afd4bd152943a0... | North China | [alternative_region] | False | [{'name': 'China (excl. HK & Macau)', 'layer':... | [{'name': 'North China', 'layer': ['alternativ... | geography | [{'id': 'b5fafce6e20de2dc307fb7e0b89978ee91a49... | [104.7737174548, 40.9901656588] | [] | {'importProductTags': [], 'exportProductTags':... |
3 | d1d5a3d3666d6ebb65a1b53e626b07f6b8540e8048a524... | Shipyard Nansha China | [terminal] | True | [{'name': 'Nansha [CN]', 'layer': ['port', 'st... | [{'name': 'Shipyard Nansha China', 'layer': ['... | geography | [{'label': 'Shipyard Nansha China', 'layer': '... | [113.5214914215, 22.7496639804] | [] | {'importProductTags': [], 'exportProductTags':... |
4 | ae0f224030f7337d0ffe5a54d290a9f0bd029f636eaf12... | China Yangfan Group | [terminal] | True | [{'name': 'Zhoushan [CN]', 'layer': ['port', '... | [{'name': 'China Yangfan Group', 'layer': ['te... | geography | [{'label': 'China Yangfan Group', 'layer': 'te... | [122.2921859199, 29.9282176371] | [] | {'importProductTags': [], 'exportProductTags':... |
Now we have demonstrated how to extract reference data via Geographies. Similarly, the methodology works for Products & Vessels endpoint as well.
# To load all products
products = v.Products().load_all().to_df()
products.head(5)
id | name | layer.0 | parent.0.name | |
---|---|---|---|---|
0 | 887940a6cf2d527a20d82a5f163ecce502878ceb1cd59f... | 0.005 / 50ppm | grade | Gasoil |
1 | 8bb096fb847f92af86235002b2a78ca0437543722cdb8c... | 0.05 / 500ppm | grade | Gasoil |
2 | 35f8222ff81fe5befafee9c64c1d76618e4cc53e74021a... | 0.1 / 1000ppm | grade | Gasoil |
3 | 881be476857ff08dcf6a8708a2fc279d26770cecf245f7... | 0.1+ / 1000ppm Plus (HS) | grade | Gasoil |
4 | a6ef13d2f3145a1b67a81300c1cfa4f21874f24fab4f8f... | 180 CST | grade | High Sulphur Fuel Oil |
# To load Gasoil only
gasoil = v.Products().search(term='Gasoil').to_df()
gasoil.head(5)
id | name | layer.0 | parent.0.name | |
---|---|---|---|---|
0 | b2034f1ad3a4ac269e962f00b9914d6b909923cf904d99... | Gasoil | category | Diesel/Gasoil |
1 | deda35eb9ca56b54e74f0ff370423f9a8c61cf6a3796fc... | Diesel/Gasoil | group_product | Clean Petroleum Products |
2 | e06296595e1d554008a70172440d5582c923bdb8182af5... | Coker Gasoil | grade | Dirty Feedstocks |
3 | feb8190865392ab6caecd7709077a58645ec4828c23d94... | Marine Gasoil | grade | Gasoil |
# To load all vessels
vessels = v.Vessels().load_all().to_df()
vessels.head(5)
id | name | imo | vessel_class | |
---|---|---|---|---|
0 | 62f3f3c1f5a663d621fe6cf9537c7d936b547497932f5d... | \tATHINEA | 9291248.0 | oil_lr2 |
1 | e6b259c04da30a57db353665e7e61f67a0a3222b96c457... | \tBORA | 9276004.0 | oil_mr2 |
2 | f351708121bce4d357ac5fad967cb1bf7fe5072773f05a... | 0051-04 | oil_coastal | |
3 | 1761da4fb069cd6ce153b6ad1c48e15cdb994eb386e4aa... | 058 | oil_coastal | |
4 | c817b5994efe14621949533d6777b22ce11db1c6bf9e48... | 1011 | oil_coastal |
# To load all vessels with a name containing 'Maersk' (currently named or previously named)
maersk_vessels = v.Vessels().search(term='Maersk').to_df()
maersk_vessels.head(5)
id | name | imo | vessel_class | |
---|---|---|---|---|
0 | 00d89be99f08890c9122c326aa32c83ae6d557629bc124... | VS REMLIN | 9252307 | oil_mr1 |
1 | 03c191caf28a554c8c1adfc11ff2a7a08ad5fa21a83892... | SOUTH LOYALTY | 9537769 | oil_vlcc |
2 | 07e562b509f2617c18f9e848c51bdec8e97488c4a41bbc... | HENRIETTE MAERSK | 9399349 | oil_mr1 |
3 | 085ab83b592713e314070620a8cb9f4d795a2b486075f4... | VS | 9252292 | oil_mr1 |
4 | 09332aaa6067574eac586030b5b81cf5554337c4f48d17... | BULL SULAWESI | 9180920 | oil_lr2 |
In this tutorial, we covered the essentials of working with reference data using the Vortexa Python SDK. We began by discussing the importance of reference data and its role in supporting accurate and consistent data operations. Through various examples, including locations, vessels, and products, we demonstrated how reference data can be effectively applied in your analyses.
You’ve learned how to query and retrieve reference data based on different criteria such as name or type. Mastering the use of reference data enables you to drive more accurate insights, improve data consistency, and enhance your understanding of the energy markets.