This is a "parameters" cell, which defines default
# Our default parameters
# This cell has a "parameters" tag, means that it defines the parameters for use in the notebook
start_date = "2001-08-05"
stop_date = "2016-01-01"
We'll run plt.ioff()
so that we don't get double plots in the notebook
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import papermill as pm
plt.ioff()
np.random.seed(1337)
# Generate some fake data by date
dates = pd.date_range("2010-01-01", "2017-01-01")
data = pd.DataFrame(np.random.randn(len(dates)), index=dates, columns=['mydata'])
data = data.rolling(100).mean() # Smooth it so it looks purdy
Here we use the start_date and stop_date parameters, which are defined above by default, but can be overwritten at runtime by papermill.
data_highlight = data.loc[start_date: stop_date]
We use the pm.record()
function to keep track of how many records were included in the
highlighted section. This lets us inspect this value after running the notebook with papermill.
We also include a ValueError if we've got a but in the start/stop times, which will be captured and displayed by papermill if it's triggered.
num_records = len(data_highlight)
pm.record('num_records', num_records)
if num_records == 0:
raise ValueError("I have no data to highlight! Check that your dates are correct!")
Below we'll generate a matplotlib figure with our highlighted dates. By calling pm.display()
, papermill
will store the figure to the key that we've specified (highlight_dates_fig
). This will let us inspect the
output later on.
fig, ax = plt.subplots()
ax.plot(data.index, data['mydata'], c='k', alpha=.5)
ax.plot(data_highlight.index, data_highlight['mydata'], c='r', lw=3)
ax.set(title="Start: {}\nStop: {}".format(start_date, stop_date))
pm.display('highlight_dates_fig', fig)