#!/usr/bin/env python
# coding: utf-8
This notebook was generated on 14 July 2020. Vortexa is constantly improving the quality of our data and models, and consequently some historical data points may change causing future runs of the notebook to yield different results.

The version of Vortexa SDK used to generate this notebook was:

vortexasdk-0.21.1

The following packages were installed to run this notebook:

pandas==0.25.2
matplotlib==3.2.2
# In[1]:


from vortexasdk import Products, CargoTimeSeries, Geographies
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt


# # Products tutorial

# First I’m going to show you how to get the Vortexa ID for a product you are interested in studying. There are many ways of doing this, and today I’m going to show you one option I have used.

# From the examples in the docs found here: https://vortechsa.github.io/python-sdk/endpoints/products/. We can see an example line of code which shows us how to look in for different products in a list

# In[2]:


df = Products().search(term=['diesel', 'fuel oil', 'grane']).to_df()


# In[3]:


df.head()


# For my study I want to focus on crude/condensates so I am going to modify the list which contains oil product names like this: 

# In[4]:


crude_search_df = Products().search(term=['crude']).to_df()


# In[5]:


crude_search_df


# Here we can see that there are 18 rows and we only want the id where the name column is equal to `Crude/Condensates	`. So we can query the DataFrame like this to get just row of the DataFrame we are interested in. 

# In[6]:


crude_search_df.query("name=='Crude/Condensates'")


# If you look at the end of the id you can see it finishes with `...` this suggests that we cant see the full legnth of the column. If we increase the width options of the row like this `pd.set_option('max_colwidth', 75)` and run the same query of the dataframe we can see the full id. The rest of the notebook will have the column width settings we have defined here so we will not need to do it again. 

# In[7]:


pd.set_option('max_colwidth', 75)
crude_search_df.query("name=='Crude/Condensates'")


# # Geographies 

# Just like before we are going to use one of the examples from the documentation and slightly tweak it to what we need for our study. Docs found here: https://vortechsa.github.io/python-sdk/endpoints/geographies/.

# In[8]:


df = Geographies().search(term=["Liverpool", "Southampton"]).to_df()


# In[9]:


df


# In[10]:


china_search_df = Geographies().search(term=["China"]).to_df()


# In[11]:


china_search_df.head(5)


# In[12]:


china_search_df.query("name=='China'")


# # Chinese floating storage study
# 
# For my study I want to look at Crude and condensates in currently in floating storage sitated in China and how this has changed over in 2020. So once again Im going to take the code which is provided in the documentation and change it to my specific needs.

# Lets break down the query bellow line by line.
# 
# Together lets break this down line by line to understand whats going on. 
# 
# 1) The first line finds the ID for Rotterdam using the geographies endpoint and assigns it to a variable called `rotterdam`
# 
# 2) Then the ID for crude using the products endpoint and assigns it to a variable called `crude`
# 
# 3) Then it calls the CargoTimeSeries endpoint
# 
# 4) The `timeseries_unit` arguement is set to `bpd` which means the unit is set to barrels
# 
# 5) The `timeseries_frequency` arguement is set to `month` which means the time scale is set the months
# 
# 6) The `filter_origins` arguement is set to `rotterdam` the variable defined in the 1st line
# 
# 7) The `filter_products` arguement is set to `crude` which was defined in the 2nd line
# 
# 8) The `filter_activity` argument is set to `loading_state`. 
# 
# 9) The `filter_time_min`, the start time for the query is set to the beginning of 2018
# 
# 10) The `filter_time_max`, the end time for the query is set to the end of 2018
# 
# 11) The search result is turned into a DataFrame
# 

# In[13]:


rotterdam = [g.id for g in Geographies().search("rotterdam").to_list() if "port" in g.layer]
crude = [p.id for p in Products().search("crude").to_list() if "Crude" == p.name]
search_result = CargoTimeSeries().search(
    timeseries_unit='bpd',
    timeseries_frequency='month',
    filter_origins=rotterdam,
    filter_products=crude,
    filter_activity='loading_state',
    filter_time_min=datetime(2018, 1, 1),
    filter_time_max=datetime(2018, 12, 31))
df = search_result.to_df()


# So how can we change that query to get Crude/condestates in floating storage sitatued in China? 
# As we already have the IDs for our geography and product we dont need to call those endpoints in our first 2 lines.
# 
# 1) We can assign the ID for China to a variable called `china_id` using the ID we found earlier in the notebook
# 
# 2) Assign the ID for the Crude/Condensates to a variable called `crude_condesates_id`
# 
# 3) We keep this the same as before as we are calling the same endpoint.
# 
# 4) For our 4th line, I prefer to think of things in terms of tonnes so I’m going to change the timeseries_unit to be `t`.
# 
# 5) For the 5th line, I’m going to change `month` to `day` as I'd like to see the change on a daily basis
# 
# 6) Here I’m going to change this one slightly, as I’m not concerned where the crude/ condensates have come from I’m going to remove the `filter_origins` argument and replace it which `filter_storage_locations`, and set it to `china_id` which we have defined in the first line.
# 
# 7) Set the `filter_products` argument to `crude_condesates_id` which we have defined in the 2nd line.
# 
# 8) This time for the 8th line I’m going to set the `filter_activity` to `'storing_state'`.
# 
# 9) Here I have changed the date to be at the start of this year
# 
# 10) Using `datetime.today().date()` we get today’s date
# 
# 11) Finally, I’m going to keep the 11th line the same. As I would like the results to be a DataFrame just like in the first query
# 
# 
# Lets see what happens

# In[14]:


china_id = '934c47f36c16a58d68ef5e007e62a23f5f036ee3f3d1f5f85a48c572b90ad8b2'
crude_condesates_id = '54af755a090118dcf9b0724c9a4e9f14745c26165385ffa7f1445bc768f06f11'
search_result = CargoTimeSeries().search(
    timeseries_unit='t',
    timeseries_frequency='day',
    filter_storage_locations=china_id,
    filter_products=crude_condesates_id,
    filter_activity='storing_state',
    filter_time_min=datetime(2020, 1, 1),
    filter_time_max=datetime.today().date())
df_fs = search_result.to_df()


# In[15]:


df_fs


# # Displaying this data in a graph

# So here Im to show you how to display the graph in a notebook but first I'm going to show you how to export the data as a CSV so you can look at the data in Excel or Google Sheets.
# 
# To export the DataFrame to your desktop as a CSV add `.to_csv('~/Desktop/chinese_floating_storage.csv')` to the DataFrame in a cell.
# 
# Like this: 

# In[16]:


df_fs.to_csv('~/Desktop/chinese_floating_storage.csv')


# Now if you look on your desktop there should be a file called `chinese_floating_storage.csv`, and you'll be able to open this file in excel. 

# Using a python library called matplotlib that we imported at the top of this notebook you can also display the results of the query like this: 

# In[17]:


# rename columns
df_fs = df_fs.rename(columns={'key': 'date',
                              'value': 't',
                              'count': 'number_of_cargo_movements'})

# remove time zone from timestamp
df_fs['date'] = pd.to_datetime(df_fs['date']).dt.tz_localize(None)


# In[18]:


floating_storage = df_fs.set_index('date')['t'] / 1000
floating_storage.plot(title='Chinese crude oil floating storage', grid=True)
plt.xlabel('date')
plt.ylabel('kt');