Adapted from Ravin Kumar's PyData Global 2020 talk on Bayesian Decision Science (Youtube)(Github), updated with pymc
v4.0 and some notes on learning the new syntax.
His original talk was done with pymc3, and since then pymc3
was rebranded as pymc
and then they hit version 4.0 recently, which introduced some big changes.
import matplotlib.pyplot as plt
from scipy import stats, optimize
import numpy as np
import pandas as pd
# np.set_printoptions(precision=2, floatmode="fixed")
import arviz as az
%config InlineBackend.figure_format='retina'
Note the new way of importing pymc
:
import pymc as pm
print(f"Running on PyMC v{pm.__version__}")
Running on PyMC v4.0.0
Next few cells follows Ravin's example of a newsvendor problem. First section he discusses a very simple example where demand is fixed, then you can simply write out the problem in code below:
newspaper_cost = 5
customer_price = 7
anishas_inventory = 42
demand = 40
profit = demand*customer_price - newspaper_cost*anishas_inventory
f"Anisha makes ${profit} in profit"
'Anisha makes $70 in profit'
The next point is that you can use numpy
and turn this into a function:
def daily_profit(inventory, demand, newsvendor_cost=5, customer_price=7):
"""Calculates profit for a given day given inventory and demand"""
return customer_price*np.min([inventory, demand]) - newsvendor_cost*inventory
f"Decision 1: profit ${daily_profit(42, 40)}, Decision 2 profit ${daily_profit(38, 40)}"
'Decision 1: profit $70, Decision 2 profit $76'
scipy.optimize
¶So he then demonstrates that you could just plot a whole bunch of values of inventory and graphically inspect where profit is maximized, but we can do this by turning it into an optimization problem using scipy.optimize
.
# Restating here so future slides won't have the print
def objective(inventory: int, demands: iter):
"""Takes an iterable of sales and returns the total profit"""
# Make reward function negative to turn this into a minimization problem
return -np.sum([daily_profit(inventory, d) for d in demands])
# Let scipy find minimie objective function (negative profit) if demands is set to 40
opt_stoch = optimize.minimize_scalar(objective, bounds=(0, np.inf), args=([40, 40]))
opt_stoch
fun: -159.9999981773509 message: '\nOptimization terminated successfully;\nThe returned value satisfies the termination criteria\n(using xtol = 1.48e-08 )' nfev: 43 nit: 38 success: True x: 40.00000018226491
Great, so we have an optimal objective value of -160 (i.e. $160 in profit) and it found the optimal inventory is 40. Makes sense - if demand is exactly 40 then you optimize profit when you buy and sell exactly that number of newspapers.
But real life is a little more complicated, so what if we simulate random demand?
Here, we simulate that demand is drawn from a normal distribution: $\mathcal{N}(40,20)$
np.random.seed(seed=0)
random_seeds = [1,2,3,4,5]
demand = np.round(stats.norm(40, 20).rvs(15))
demand_seen, demand_unseen = demand[:5], demand[5:]
f"The first five days demand we have seen {demand_seen}"
'The first five days demand we have seen [75. 48. 60. 85. 77.]'
f"The UNSEEN next ten days of demand {demand_unseen}"
'The UNSEEN next ten days of demand [20. 59. 37. 38. 48. 43. 69. 55. 42. 49.]'
He then goes on to use examples of taking the sample mean (the mean of the 5 days of demand that was seen) and the critical fractile equation to calculate the optimal inventory strategy.
I included his next point here, which is that 5 samples (i.e. seeing only 5 days of demand) can have wide differences in sample mean and std deviation. Very nice plot.
simulations = 100
mean, std = [], []
for _ in range(simulations):
true_mean,true_std = 40, 20
values = stats.norm(40, 20).rvs(5)
mean.append(values.mean())
std.append(values.std())
fig, ax = plt.subplots(2,1, figsize = (18, 8))
ax[0].plot(range(simulations), mean)
ax[0].set_xticks([])
ax[0].axhline(40, 0, 1, c="black", linestyle="--")
ax[0].set_ylabel("Estimate of Mean")
ax[1].plot(range(simulations), std)
ax[1].axhline(20, 0, 1, c="black", linestyle="--")
ax[1].set_xlabel("Calculation iteration")
ax[1].set_ylabel("Estimate of Std")
ax[0].set_title("Instability in Sample Mean and Standard Deviation from just 5 samples");
Bayesian Modeling offers advantages because it handles this low sample size problem natively. He doesn't discuss how to choose informative priors and all that, but in this case he goes ahead and specifies the generative model below:
$$ \begin{aligned} \sigma &\sim \text{HalfStudentT}(\sigma=10, \nu=20) & \text{Prior} \\ \mu &\sim \mathcal{N}(\mu=\bar{x}, \sigma=20) & \text{Prior} \\ \text{Demand} &\sim \text{TruncatedNormal}(\mu, \sigma) & \text{Likelihood} \end{aligned} $$demand_mean, demand_std = demand_seen.mean(), demand_seen.std()
pymc
¶Below I make some slight modifications to follow with the pymc
docs. The workflow starts with specifying the generative model as such:
newsvendor = pm.Model()
with newsvendor:
# These are our priors that are set through "experience". Well get back to this
sd = pm.HalfStudentT("standard_deviation_of_newspaper_demand", sigma=10, nu=20)
mu = pm.Normal("mean_of_newspaper_demand", demand_seen.mean(), 20)
demand = pm.TruncatedNormal("demand", mu=mu, sigma=sd, lower=0, observed = demand_seen)
So first the model is instantiated and then everything follows in the with
call. The priors and the likelihood are specified using a series of distributions. Each distribution is given a name and the last item demand
gets an additional call observed
where you can add in the observed data.
Next, we use pm.sample
:
with newsvendor:
idata = pm.sample(tune=5000, draws=10000, chains=2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 4 jobs) NUTS: [standard_deviation_of_newspaper_demand, mean_of_newspaper_demand]
Sampling 2 chains for 5_000 tune and 10_000 draw iterations (10_000 + 20_000 draws total) took 17 seconds.
Here, idata is an arviz.InferenceData
object, which is like a dataframe for storing the results of our sampling process. Here pymc
automatically assigned the NUTS sampler for this problem, which is a nice 'batteries included' feature here.
We can inspect this new object below:
idata
<xarray.Dataset> Dimensions: (chain: 2, draw: 10000) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 ... 9998 9999 Data variables: mean_of_newspaper_demand (chain, draw) float64 51.32 ... 7... standard_deviation_of_newspaper_demand (chain, draw) float64 16.92 ... 1... Attributes: created_at: 2022-06-12T23:11:13.117241 arviz_version: 0.12.1 inference_library: pymc inference_library_version: 4.0.0 sampling_time: 16.74429488182068 tuning_steps: 5000
<xarray.Dataset> Dimensions: (chain: 2, draw: 10000, demand_dim_0: 5) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 3 4 5 6 ... 9994 9995 9996 9997 9998 9999 * demand_dim_0 (demand_dim_0) int64 0 1 2 3 4 Data variables: demand (chain, draw, demand_dim_0) float64 -4.725 -3.766 ... -3.332 Attributes: created_at: 2022-06-12T23:11:13.895225 arviz_version: 0.12.1 inference_library: pymc inference_library_version: 4.0.0
<xarray.Dataset> Dimensions: (chain: 2, draw: 10000) Coordinates: * chain (chain) int64 0 1 * draw (draw) int64 0 1 2 3 4 5 ... 9995 9996 9997 9998 9999 Data variables: (12/13) step_size_bar (chain, draw) float64 1.003 1.003 ... 0.9769 0.9769 perf_counter_diff (chain, draw) float64 0.0005189 0.000796 ... 0.0005897 process_time_diff (chain, draw) float64 0.000519 0.000785 ... 0.00059 energy (chain, draw) float64 28.53 28.2 29.32 ... 26.86 26.16 step_size (chain, draw) float64 1.173 1.173 ... 0.9563 0.9563 tree_depth (chain, draw) int64 2 2 2 2 2 2 1 2 ... 2 2 2 2 2 2 2 2 ... ... perf_counter_start (chain, draw) float64 9.692 9.693 9.694 ... 16.05 16.05 n_steps (chain, draw) float64 3.0 3.0 3.0 3.0 ... 3.0 3.0 3.0 acceptance_rate (chain, draw) float64 0.6906 1.0 0.5198 ... 0.9162 1.0 diverging (chain, draw) bool False False False ... False False max_energy_error (chain, draw) float64 1.394 -0.3279 ... 0.1823 -0.3325 lp (chain, draw) float64 -28.42 -25.77 ... -26.35 -25.45 Attributes: created_at: 2022-06-12T23:11:13.124898 arviz_version: 0.12.1 inference_library: pymc inference_library_version: 4.0.0 sampling_time: 16.74429488182068 tuning_steps: 5000
<xarray.Dataset> Dimensions: (demand_dim_0: 5) Coordinates: * demand_dim_0 (demand_dim_0) int64 0 1 2 3 4 Data variables: demand (demand_dim_0) float64 75.0 48.0 60.0 85.0 77.0 Attributes: created_at: 2022-06-12T23:11:13.896280 arviz_version: 0.12.1 inference_library: pymc inference_library_version: 4.0.0
And we can use arviz
to call some helpful functions on this object.
Here we can plot the posterior distribution of our parameters:
az.plot_trace(idata, combined=True, figsize=(12,8));