Python for the Energy Industry
In the 'Cargo Movements Example' video, we saw the datetime
object used to specify a particular data and time to look for cargo movements. In this lesson we explore in more detail the datetime
object, and how it is used for filtering.
When given 3 arguments, a datetime object represents midnight at the beginning of the day specified by datetime(YYYY,MM,DD)
:
from datetime import datetime
# 00:00 November 1st, 2020
print(datetime(2020,11,1))
2020-11-01 00:00:00
Additional arguments represent hours, minutes, and seconds respectively:
# 12:00 November 1st, 2020
print(datetime(2020,11,1,12))
2020-11-01 12:00:00
# 12:30 November 1st, 2020
print(datetime(2020,11,1,12,30))
2020-11-01 12:30:00
# 12:30:09 November 1st, 2020
print(datetime(2020,11,1,12,30,9))
2020-11-01 12:30:09
It's straightforward to get the current date/time:
print(datetime.utcnow())
2021-01-24 14:58:59.263863
print(datetime.utcnow() - datetime(2020,11,1))
84 days, 14:59:24.868841
Say you want data over a time period stretching from 1 day, or week, or month ago, up to the current time. The relativedelta
object can be used for this.
from dateutil.relativedelta import relativedelta
now = datetime.utcnow()
one_day_ago = now - relativedelta(days=1)
one_week_ago = now - relativedelta(weeks=1)
one_month_ago = now - relativedelta(months=1)
print(one_day_ago)
print(one_week_ago)
print(one_month_ago)
2021-01-23 15:00:28.838017 2021-01-17 15:00:28.838017 2020-12-24 15:00:28.838017
When pulling Cargo Movements data from the Vortexa API, we are generally only interested in some subset of the data. This may be data from a particular time window, originating or destinated for a particular location, carrying a particular product, a particular vessel, or some combination of these conditions. This is called 'filtering'.
Filtering by location, product, or vessel is done using the associated IDs that we can access from the relevant endpoints. Filtering by time is a bit different: as you've seen, datetime objects are used for this.
As a reminder, documentation for the Cargo Movements endpoint can be found here.
The meaning of filter_time_min
and filter_time_max
depends on the filter_activity
corresponding to these times. The following activities:
These filters that correspond to an exact timestamp at which the event occured. Filtering on these will give Cargo Movements where the timestamp of the corresponding activity is between filter_time_min
and filter_time_max
.
import vortexasdk as v
cm_query = v.CargoMovements().search(
filter_activity="loading_start",
filter_time_min=one_day_ago,
filter_time_max=now)
print(len(cm_query))
2021-01-24 15:02:08,726 vortexasdk.client — WARNING — You are using vortexasdk version 0.28.0, however version 0.28.5 is available. You should consider upgrading via the 'pip install vortexasdk --upgrade' command. 274
This means that there are 257 Cargo Movements that started loading between midnight and midday on November 1st. Obviously, if the same time is given as both the min and max for a timestamp filter, zero results will be returned:
cm_query = v.CargoMovements().search(
filter_activity="loading_end",
filter_time_min=now,
filter_time_max=now)
print(len(cm_query))
Loading from API: 0it [00:00, ?it/s]
0
Note: you can of course use specific datetime objects, rather than relative dates, for filtering.
Certain activities correspond to states that last for some time, rather than instantaneous timestamps:
When filtering on a state, you will get all Cargo Movements which were in that state at any point between filter_time_min
and filter_time_max
. This means even if filter_time_min
and filter_time_max
are the same time, you will still get back any Cargo Movements that were in that state at that time:
cm_query = v.CargoMovements().search(
filter_activity="loading_state",
filter_time_min=now,
filter_time_max=now)
print(len(cm_query))
427
Naturally, the number of Cargo Movements returned by a general query like this will become quite large as the filter window is expanded:
cm_query = v.CargoMovements().search(
filter_activity="loading_state",
filter_time_min=one_day_ago,
filter_time_max=now)
print('last day:',len(cm_query))
cm_query = v.CargoMovements().search(
filter_activity="loading_state",
filter_time_min=one_week_ago,
filter_time_max=now)
print('last week:',len(cm_query))
cm_query = v.CargoMovements().search(
filter_activity="loading_state",
filter_time_min=one_month_ago,
filter_time_max=now)
print('last month:',len(cm_query))
Loading from API: 1000it [00:01, 865.14it/s]
last day: 835
Loading from API: 4000it [00:04, 883.84it/s]
last week: 3804
Loading from API: 16500it [00:10, 1501.56it/s]
last month: 16169
Note of caution: be careful about directly putting datetime.utcnow()
as the filter_time_max
argument, or putting now = datetime.utcnow()
in the same cell as now is passed in the argument. There is a risk that small differences between the time measured on your computer and the Vortexa servers can mean that now
is thought to be in the future, giving an error!
Create a pandas DataFrame that gives the number of cargos that are being loaded at 00:00UTC on each day of the previous month.