pyroはpytorchをベースとした確率的プログラミングのためのフレームワークです。 pytorchの性能や早い時期にNUTSが実装されていることや書き方がtensorflow probabilityに比べ簡潔なところもあって注目されています。 Uberで開発され使われているようですがDropout as variational bayesなどで有名なGhahramani先生がチーフサイエンティストらしいので開発に関わっているのかもしれません。
サンプルとして公開されている野球選手のシーズン中の打数、打率のデータから打率を推定するコードではstanでの実施レポート に沿ってデータの取得、モデルの定義と実行、事後予測分布(posterior predictive distribution)の表示と評価がされており、 以下ではそれを追って実行します。
stanのレポートの方ではさらに
といった内容があります。
from __future__ import absolute_import, division, print_function
import argparse
import logging
import math
import os
import numpy as np
import pandas as pd
import torch
import pyro
import pyro.poutine as poutine
from pyro.distributions import Beta, Binomial, HalfCauchy, Normal, Pareto, Uniform
from pyro.distributions.util import logsumexp
from pyro.infer import EmpiricalMarginal
from pyro.infer.abstract_infer import TracePredictive
from pyro.infer.mcmc import MCMC, NUTS
pyro.__version__
'0.3.0'
import matplotlib.pyplot as plt
plt.style.use("ggplot")
import seaborn as sns
logging.basicConfig(format='%(message)s', level=logging.INFO)
# Enable validation checks
データのダウンロードと表示
pyro.enable_validation(True)
DATA_URL = "https://d2fefpcigoriu7.cloudfront.net/datasets/EfronMorrisBB.txt"
baseball_dataset = pd.read_csv(DATA_URL, "\t")
18人の打者のデータ
baseball_dataset.head()
FirstName | LastName | At-Bats | Hits | BattingAverage | RemainingAt-Bats | RemainingAverage | SeasonAt-Bats | SeasonHits | SeasonAverage | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Roberto | Clemente | 45 | 18 | 0.400 | 367 | 0.3460 | 412 | 145 | 0.352 |
1 | Frank | Robinson | 45 | 17 | 0.378 | 426 | 0.2981 | 471 | 144 | 0.306 |
2 | Frank | Howard | 45 | 16 | 0.356 | 521 | 0.2764 | 566 | 160 | 0.283 |
3 | Jay | Johnstone | 45 | 15 | 0.333 | 275 | 0.2218 | 320 | 76 | 0.238 |
4 | Ken | Berry | 45 | 14 | 0.311 | 418 | 0.2727 | 463 | 128 | 0.276 |
baseball_dataset["BattingAverage"].plot.bar()
plt.show()
baseball_dataset["RemainingAverage"].plot.bar()
plt.show()
baseball_dataset["SeasonAverage"].plot.bar()
plt.show()
名字と名前をくっつけてtrain data(打率とヒットの数), test data(全シーズンでの打率とヒット数)を分離する。
def train_test_split(pd_dataframe):
"""
Training data - 45 initial at-bats and hits for each player.
Validation data - Full season at-bats and hits for each player.
"""
train_data = torch.tensor(pd_dataframe[["At-Bats", "Hits"]].values, dtype=torch.float)
test_data = torch.tensor(pd_dataframe[["SeasonAt-Bats", "SeasonHits"]].values, dtype=torch.float)
first_name = pd_dataframe["FirstName"].values
last_name = pd_dataframe["LastName"].values
player_names = [" ".join([first, last]) for first, last in zip(first_name, last_name)]
return train_data, test_data, player_names
train, _, player_names = train_test_split(baseball_dataset)
at_bats, hits = train[:, 0], train[:, 1]
print(train.shape)
torch.Size([18, 2])
baseball_dataset.columns
Index(['FirstName', 'LastName', 'At-Bats', 'Hits', 'BattingAverage', 'RemainingAt-Bats', 'RemainingAverage', 'SeasonAt-Bats', 'SeasonHits', 'SeasonAverage'], dtype='object')
sns.pairplot(baseball_dataset[['At-Bats', 'Hits', 'BattingAverage',
'RemainingAt-Bats', 'RemainingAverage', 'SeasonAt-Bats', 'SeasonHits',
'SeasonAverage']])
plt.show()
打率(at_bats)からのヒットの数の推定に4つのモデル
を試している。階層モデルになってるがpyro.sampleの出力を別の確率変数にパラメータとして代入するような書き方になっている。
それぞれの打者の打率を独立のベルヌーイ分布に従うとするモデル、pyroでは統計モデルは関数として定義する。
$\phi \sim Uniform(0,1)$
$y_n \sim Binominal(K_n,\phi)$
$K_n$は選手nの打率(モデルの引数) ハイパーパラメータ$\phi$は[0,1]間の一様分布、new_tensor(0),new_tensor(1)はモデルで使う定数
観測データは出力されるsample関数にobs引数として追加することで関係を示している。
def fully_pooled(at_bats, hits):
r"""
Number of hits in $K$ at bats for each player has a Binomial
distribution with a common probability of success, $\phi$.
:param (torch.Tensor) at_bats: Number of at bats for each player.
:return: Number of hits predicted by the model.
"""
phi_prior = Uniform(at_bats.new_tensor(0), at_bats.new_tensor(1))
phi = pyro.sample("phi", phi_prior)
return pyro.sample("obs", Binomial(at_bats, phi), obs=hits)
$\theta_n \sim Uniform(0,1)$
$y_n \sim Binominal(K_n,\theta_n)$
それぞれの選手のハイパーパラメータ$\theta_n$が独立になっている。
pyro.plateという関数で内部で使うパラメータを定義している(ように見える)
def not_pooled(at_bats, hits):
r"""
Number of hits in $K$ at bats for each player has a Binomial
distribution with independent probability of success, $\phi_i$.
:param (torch.Tensor) at_bats: Number of at bats for each player.
:param (torch.Tensor) hits: Number of hits for the given at bats.
:return: Number of hits predicted by the model.
"""
num_players = at_bats.shape[0]
with pyro.plate("num_players", num_players):
phi_prior = Uniform(at_bats.new_tensor(0), at_bats.new_tensor(1))
phi = pyro.sample("phi", phi_prior)
return pyro.sample("obs", Binomial(at_bats, phi), obs=hits)
$\theta_n \sim Beta(\alpha,\beta)$
$\alpha=\kappa\phi,\beta=\kappa(1-\phi)$
$\phi \sim Uniform(0,1)$
$\kappa \sim Pareto(1,1.5) \propto \kappa^{2.5}$
$y_n \sim Binominal(K_n,\theta_n)$
$\theta$を[0,1]内に収めるためにbeta分布を用いている階層モデル。さらにサンプリングで$\alpha,\beta$を範囲内に収めるために$\kappa,\phi$を定義している。pyroではPareto分布も使える!
def partially_pooled(at_bats, hits):
r"""
Number of hits has a Binomial distribution with independent
probability of success, $\phi_i$. Each $\phi_i$ follows a Beta
distribution with concentration parameters $c_1$ and $c_2$, where
$c_1 = m * kappa$, $c_2 = (1 - m) * kappa$, $m ~ Uniform(0, 1)$,
and $kappa ~ Pareto(1, 1.5)$.
:param (torch.Tensor) at_bats: Number of at bats for each player.
:param (torch.Tensor) hits: Number of hits for the given at bats.
:return: Number of hits predicted by the model.
"""
num_players = at_bats.shape[0]
m = pyro.sample("m", Uniform(at_bats.new_tensor(0), at_bats.new_tensor(1)))
kappa = pyro.sample("kappa", Pareto(at_bats.new_tensor(1), at_bats.new_tensor(1.5)))
with pyro.plate("num_players", num_players):
phi_prior = Beta(m * kappa, (1 - m) * kappa)
phi = pyro.sample("phi", phi_prior)
return pyro.sample("obs", Binomial(at_bats, phi), obs=hits)
$\alpha_n = \rm logit(\theta_n) =\frac{\theta_n}{1-\theta_n}$
サンプルするパラメータの出力にlogitをかませたversion。$\alpha \in (-\infty,\infty)$でありこれによって$\theta$が0,1近辺の時に浮動小数点演算の桁あふれ、不安定性を防ぐ。 Binominalにはオプションとしてlogits引数がある。
def partially_pooled_with_logit(at_bats, hits):
r"""
Number of hits has a Binomial distribution with a logit link function.
The logits $\alpha$ for each player is normally distributed with the
mean and scale parameters sharing a common prior.
:param (torch.Tensor) at_bats: Number of at bats for each player.
:param (torch.Tensor) hits: Number of hits for the given at bats.
:return: Number of hits predicted by the model.
"""
num_players = at_bats.shape[0]
loc = pyro.sample("loc", Normal(at_bats.new_tensor(-1), at_bats.new_tensor(1)))
scale = pyro.sample("scale", HalfCauchy(scale=at_bats.new_tensor(1)))
with pyro.plate("num_players", num_players):
alpha = pyro.sample("alpha", Normal(loc, scale))
return pyro.sample("obs", Binomial(at_bats, logits=alpha), obs=hits)
stanのレポートではさらに事後分布を均一化させるためにNon-Centered Parameterizationという変数変換を行なっている(参考 )。
平均、分散、四分位点の計算、traceからの要約統計やRhatの表示のための関数
def get_site_stats(array, player_names):
"""
Return the summarized statistics for a given array corresponding
to the values sampled for a latent or response site.
"""
# TODO: only use pandas (or any lightweight tabular package) to display final result
if len(array.shape) == 1:
df = pd.DataFrame(array).transpose()
else:
df = pd.DataFrame(array, columns=player_names).transpose()
return df.apply(pd.Series.describe, axis=1)[["mean", "std", "25%", "50%", "75%"]]
def summary(trace_posterior, sites, player_names, transforms={}, diagnostics=True):
"""
Return summarized statistics for each of the ``sites`` in the
traces corresponding to the approximate posterior.
"""
marginal = trace_posterior.marginal(sites)
site_stats = {}
for site_name in sites:
marginal_site = marginal.support(flatten=True)[site_name]
if site_name in transforms:
marginal_site = transforms[site_name](marginal_site)
site_stats[site_name] = get_site_stats(marginal_site.numpy(), player_names)
if diagnostics and trace_posterior.num_chains > 1:
diag = marginal.diagnostics()[site_name]
site_stats[site_name] = site_stats[site_name].assign(n_eff=diag["n_eff"].numpy(),
r_hat=diag["r_hat"].numpy())
return site_stats
posterior predictive densities評価のための関数
Inference Utiliyとして周辺分布の取得などが用意されているのでそれを利用している。
http://docs.pyro.ai/en/0.2.1-release/inference_algos.html#module-pyro.infer.abstract_infer
def sample_posterior_predictive(posterior_predictive, baseball_dataset):
"""
Generate samples from posterior predictive distribution.
"""
train, test, player_names = train_test_split(baseball_dataset)
at_bats = train[:, 0]
at_bats_season = test[:, 0]
logging.Formatter("%(message)s")
logging.info("\nPosterior Predictive:")
logging.info("Hit Rate - Initial 45 At Bats")
logging.info("-----------------------------")
# set hits=None to convert it from observation node to sample node
train_predict = posterior_predictive.run(at_bats, None)
train_summary = summary(train_predict, sites=["obs"],
player_names=player_names, diagnostics=False)["obs"]
train_summary = train_summary.assign(ActualHits=baseball_dataset[["Hits"]].values)
logging.info(train_summary)
logging.info("\nHit Rate - Season Predictions")
logging.info("-----------------------------")
test_predict = posterior_predictive.run(at_bats_season, None)
test_summary = summary(test_predict, sites=["obs"],
player_names=player_names, diagnostics=False)["obs"]
test_summary = test_summary.assign(ActualHits=baseball_dataset[["SeasonHits"]].values)
logging.info(test_summary)
def evaluate_log_predictive_density(posterior_predictive, baseball_dataset):
"""
Evaluate the log probability density of observing the unseen data (season hits)
given a model and empirical distribution over the parameters.
"""
_, test, player_names = train_test_split(baseball_dataset)
at_bats_season, hits_season = test[:, 0], test[:, 1]
test_eval = posterior_predictive.run(at_bats_season, hits_season)
trace_log_pdf = []
for tr in test_eval.exec_traces:
trace_log_pdf.append(tr.log_prob_sum())
# Use LogSumExp trick to evaluate $log(1/num_samples \sum_i p(new_data | \theta^{i})) $,
# where $\theta^{i}$ are parameter samples from the model's posterior.
posterior_pred_density = logsumexp(torch.stack(trace_log_pdf), dim=-1) - math.log(len(trace_log_pdf))
logging.info("\nLog posterior predictive density")
logging.info("--------------------------------")
logging.info("{:.4f}\n".format(posterior_pred_density))
オブジェクトとしてNUTS kernelを作りそこにmodelを代入する。
MCMCクラス、run関数でサンプリングを実行する。その後上で定義した関数(sample_posterior_predictive, evaluate_log_predictive_density)を用いて事後分布の値の範囲、Rhat, Log posterior predictive densityを表示する。
num_samples=200
num_chains=3
warmup_steps=100
num_predictive_samples = num_samples * num_chains
rng_seed=52
pyro.set_rng_seed(rng_seed)
nuts_kernel = NUTS(fully_pooled)
posterior_fully_pooled = MCMC(nuts_kernel,
num_samples=num_samples,
warmup_steps=warmup_steps,
num_chains=num_chains).run(at_bats, hits)
logging.info("\nModel: Fully Pooled")
logging.info("===================")
logging.info("\nphi:")
logging.info(summary(posterior_fully_pooled, sites=["phi"], player_names=player_names)["phi"])
posterior_predictive = TracePredictive(fully_pooled,
posterior_fully_pooled,
num_samples=num_predictive_samples)
sample_posterior_predictive(posterior_predictive, baseball_dataset)
evaluate_log_predictive_density(posterior_predictive, baseball_dataset)
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Model: Fully Pooled =================== phi: mean std 25% 50% 75% n_eff r_hat 0 0.267115 0.015634 0.254908 0.267284 0.278384 NaN 1.012708 Posterior Predictive: Hit Rate - Initial 45 At Bats ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 18 Frank Robinson 17.0 0.0 17.0 17.0 17.0 17 Frank Howard 16.0 0.0 16.0 16.0 16.0 16 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 15 Ken Berry 14.0 0.0 14.0 14.0 14.0 14 Jim Spencer 14.0 0.0 14.0 14.0 14.0 14 Don Kessinger 13.0 0.0 13.0 13.0 13.0 13 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 12 Ron Santo 11.0 0.0 11.0 11.0 11.0 11 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 11 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 10 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 10 George Scott 10.0 0.0 10.0 10.0 10.0 10 Del Unser 10.0 0.0 10.0 10.0 10.0 10 Billy Williams 10.0 0.0 10.0 10.0 10.0 10 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 9 Thurman Munson 8.0 0.0 8.0 8.0 8.0 8 Max Alvis 7.0 0.0 7.0 7.0 7.0 7 Hit Rate - Season Predictions ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 145 Frank Robinson 17.0 0.0 17.0 17.0 17.0 144 Frank Howard 16.0 0.0 16.0 16.0 16.0 160 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 76 Ken Berry 14.0 0.0 14.0 14.0 14.0 128 Jim Spencer 14.0 0.0 14.0 14.0 14.0 140 Don Kessinger 13.0 0.0 13.0 13.0 13.0 168 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 41 Ron Santo 11.0 0.0 11.0 11.0 11.0 148 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 57 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 152 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 52 George Scott 10.0 0.0 10.0 10.0 10.0 142 Del Unser 10.0 0.0 10.0 10.0 10.0 83 Billy Williams 10.0 0.0 10.0 10.0 10.0 205 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 168 Thurman Munson 8.0 0.0 8.0 8.0 8.0 137 Max Alvis 7.0 0.0 7.0 7.0 7.0 21 Log posterior predictive density -------------------------------- -77.0128
nuts_kernel = NUTS(not_pooled)
posterior_not_pooled = MCMC(nuts_kernel,
num_samples=num_samples,
warmup_steps=warmup_steps,
num_chains=num_chains).run(at_bats, hits)
logging.info("\nModel: Not Pooled")
logging.info("=================")
logging.info("\nphi:")
logging.info(summary(posterior_not_pooled, sites=["phi"], player_names=player_names)["phi"])
posterior_predictive = TracePredictive(not_pooled,
posterior_not_pooled,
num_samples=num_predictive_samples)
sample_posterior_predictive(posterior_predictive, baseball_dataset)
evaluate_log_predictive_density(posterior_predictive, baseball_dataset)
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Model: Not Pooled ================= phi: mean std 25% 50% 75% n_eff \ Roberto Clemente 0.399943 0.075035 0.346843 0.396658 0.452164 NaN Frank Robinson 0.380921 0.070193 0.334455 0.375778 0.428543 NaN Frank Howard 0.362561 0.071634 0.312462 0.358348 0.413378 NaN Jay Johnstone 0.336355 0.071878 0.289077 0.329880 0.382827 NaN Ken Berry 0.314747 0.065654 0.270413 0.310458 0.356329 NaN Jim Spencer 0.315158 0.062964 0.268278 0.311988 0.358840 NaN Don Kessinger 0.301659 0.068475 0.252892 0.298465 0.349038 NaN Luis Alvarado 0.283721 0.062200 0.239647 0.282782 0.324852 NaN Ron Santo 0.258056 0.059806 0.215130 0.255764 0.297286 NaN Ron Swaboda 0.256684 0.067631 0.208116 0.255430 0.299480 NaN Rico Petrocelli 0.229894 0.059704 0.186203 0.224607 0.270622 NaN Ellie Rodriguez 0.236017 0.057246 0.191330 0.234975 0.274804 NaN George Scott 0.234043 0.059666 0.190660 0.231042 0.274553 NaN Del Unser 0.236534 0.061990 0.193895 0.231431 0.276140 NaN Billy Williams 0.236507 0.057134 0.193631 0.233240 0.275560 NaN Bert Campaneris 0.212033 0.061096 0.168675 0.206219 0.252786 NaN Thurman Munson 0.191130 0.057044 0.146603 0.187605 0.228246 NaN Max Alvis 0.171947 0.054420 0.131149 0.166210 0.210815 NaN r_hat Roberto Clemente 0.996561 Frank Robinson 0.996544 Frank Howard 0.997555 Jay Johnstone 0.997095 Ken Berry 1.001117 Jim Spencer 0.997749 Don Kessinger 0.998231 Luis Alvarado 1.000688 Ron Santo 0.999810 Ron Swaboda 0.998106 Rico Petrocelli 0.996533 Ellie Rodriguez 0.999227 George Scott 0.998112 Del Unser 0.998835 Billy Williams 0.995157 Bert Campaneris 1.002228 Thurman Munson 0.999830 Max Alvis 0.996702 Posterior Predictive: Hit Rate - Initial 45 At Bats ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 18 Frank Robinson 17.0 0.0 17.0 17.0 17.0 17 Frank Howard 16.0 0.0 16.0 16.0 16.0 16 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 15 Ken Berry 14.0 0.0 14.0 14.0 14.0 14 Jim Spencer 14.0 0.0 14.0 14.0 14.0 14 Don Kessinger 13.0 0.0 13.0 13.0 13.0 13 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 12 Ron Santo 11.0 0.0 11.0 11.0 11.0 11 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 11 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 10 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 10 George Scott 10.0 0.0 10.0 10.0 10.0 10 Del Unser 10.0 0.0 10.0 10.0 10.0 10 Billy Williams 10.0 0.0 10.0 10.0 10.0 10 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 9 Thurman Munson 8.0 0.0 8.0 8.0 8.0 8 Max Alvis 7.0 0.0 7.0 7.0 7.0 7 Hit Rate - Season Predictions ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 145 Frank Robinson 17.0 0.0 17.0 17.0 17.0 144 Frank Howard 16.0 0.0 16.0 16.0 16.0 160 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 76 Ken Berry 14.0 0.0 14.0 14.0 14.0 128 Jim Spencer 14.0 0.0 14.0 14.0 14.0 140 Don Kessinger 13.0 0.0 13.0 13.0 13.0 168 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 41 Ron Santo 11.0 0.0 11.0 11.0 11.0 148 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 57 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 152 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 52 George Scott 10.0 0.0 10.0 10.0 10.0 142 Del Unser 10.0 0.0 10.0 10.0 10.0 83 Billy Williams 10.0 0.0 10.0 10.0 10.0 205 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 168 Thurman Munson 8.0 0.0 8.0 8.0 8.0 137 Max Alvis 7.0 0.0 7.0 7.0 7.0 21 Log posterior predictive density -------------------------------- -99.6108
Rhatが1以下の$\phi$選手が多く収束していない。posterior predictive densityの値も小さい。
nuts_kernel = NUTS(partially_pooled)
posterior_partially_pooled = MCMC(nuts_kernel,
num_samples=num_samples,
warmup_steps=warmup_steps,
num_chains=num_chains).run(at_bats, hits)
logging.info("\nModel: Partially Pooled")
logging.info("=======================")
logging.info("\nphi:")
logging.info(summary(posterior_partially_pooled, sites=["phi"],
player_names=player_names)["phi"])
posterior_predictive = TracePredictive(partially_pooled,
posterior_partially_pooled,
num_samples=num_predictive_samples)
sample_posterior_predictive(posterior_predictive, baseball_dataset)
evaluate_log_predictive_density(posterior_predictive, baseball_dataset)
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly. Widget Javascript not detected. It may not be installed or enabled properly. Model: Partially Pooled ======================= phi: mean std 25% 50% 75% n_eff \ Roberto Clemente 0.330732 0.056697 0.287433 0.329061 0.367337 NaN Frank Robinson 0.318623 0.053476 0.281985 0.313461 0.349818 NaN Frank Howard 0.312894 0.053505 0.274333 0.307558 0.347047 NaN Jay Johnstone 0.295636 0.050027 0.261132 0.291267 0.324851 NaN Ken Berry 0.288313 0.046968 0.256531 0.284000 0.318049 NaN Jim Spencer 0.287504 0.045141 0.256558 0.284777 0.318844 NaN Don Kessinger 0.278573 0.048426 0.246480 0.277425 0.309384 NaN Luis Alvarado 0.266445 0.045962 0.236697 0.264992 0.295940 NaN Ron Santo 0.259523 0.047678 0.228490 0.259872 0.289555 NaN Ron Swaboda 0.259049 0.046129 0.228279 0.256043 0.290896 NaN Rico Petrocelli 0.246646 0.045131 0.216728 0.246783 0.272855 NaN Ellie Rodriguez 0.244179 0.048296 0.213908 0.244600 0.277833 NaN George Scott 0.249798 0.044556 0.220636 0.249137 0.280757 NaN Del Unser 0.245764 0.041945 0.217120 0.245693 0.273705 NaN Billy Williams 0.248844 0.043270 0.220262 0.249451 0.277614 NaN Bert Campaneris 0.234063 0.045807 0.203822 0.234369 0.263453 NaN Thurman Munson 0.226682 0.046412 0.198276 0.225546 0.260829 NaN Max Alvis 0.215490 0.043255 0.185727 0.214943 0.244984 NaN r_hat Roberto Clemente 1.008639 Frank Robinson 1.004020 Frank Howard 1.005832 Jay Johnstone 1.014933 Ken Berry 1.005918 Jim Spencer 1.021570 Don Kessinger 1.002926 Luis Alvarado 1.003754 Ron Santo 0.997998 Ron Swaboda 1.003618 Rico Petrocelli 1.004025 Ellie Rodriguez 1.004003 George Scott 0.999369 Del Unser 1.002674 Billy Williams 1.000619 Bert Campaneris 1.006408 Thurman Munson 1.010476 Max Alvis 1.008355 Posterior Predictive: Hit Rate - Initial 45 At Bats ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 18 Frank Robinson 17.0 0.0 17.0 17.0 17.0 17 Frank Howard 16.0 0.0 16.0 16.0 16.0 16 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 15 Ken Berry 14.0 0.0 14.0 14.0 14.0 14 Jim Spencer 14.0 0.0 14.0 14.0 14.0 14 Don Kessinger 13.0 0.0 13.0 13.0 13.0 13 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 12 Ron Santo 11.0 0.0 11.0 11.0 11.0 11 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 11 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 10 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 10 George Scott 10.0 0.0 10.0 10.0 10.0 10 Del Unser 10.0 0.0 10.0 10.0 10.0 10 Billy Williams 10.0 0.0 10.0 10.0 10.0 10 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 9 Thurman Munson 8.0 0.0 8.0 8.0 8.0 8 Max Alvis 7.0 0.0 7.0 7.0 7.0 7 Hit Rate - Season Predictions ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 145 Frank Robinson 17.0 0.0 17.0 17.0 17.0 144 Frank Howard 16.0 0.0 16.0 16.0 16.0 160 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 76 Ken Berry 14.0 0.0 14.0 14.0 14.0 128 Jim Spencer 14.0 0.0 14.0 14.0 14.0 140 Don Kessinger 13.0 0.0 13.0 13.0 13.0 168 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 41 Ron Santo 11.0 0.0 11.0 11.0 11.0 148 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 57 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 152 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 52 George Scott 10.0 0.0 10.0 10.0 10.0 142 Del Unser 10.0 0.0 10.0 10.0 10.0 83 Billy Williams 10.0 0.0 10.0 10.0 10.0 205 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 168 Thurman Munson 8.0 0.0 8.0 8.0 8.0 137 Max Alvis 7.0 0.0 7.0 7.0 7.0 21 Log posterior predictive density -------------------------------- -46.9007
nuts_kernel = NUTS(partially_pooled_with_logit)
posterior_partially_pooled_with_logit = MCMC(nuts_kernel,
num_samples=num_samples,
warmup_steps=warmup_steps,
num_chains=num_chains).run(at_bats, hits)
logging.info("\nModel: Partially Pooled with Logit")
logging.info("==================================")
logging.info("\nSigmoid(alpha):")
logging.info(summary(posterior_partially_pooled_with_logit,
sites=["alpha"],
player_names=player_names,
transforms={"alpha": lambda x: 1. / (1 + (-x).exp())})["alpha"])
posterior_predictive = TracePredictive(partially_pooled_with_logit,
posterior_partially_pooled_with_logit,
num_samples=num_predictive_samples)
sample_posterior_predictive(posterior_predictive, baseball_dataset)
evaluate_log_predictive_density(posterior_predictive, baseball_dataset)
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Widget Javascript not detected. It may not be installed or enabled properly.
Model: Partially Pooled with Logit ================================== Sigmoid(alpha): mean std 25% 50% 75% n_eff \ Roberto Clemente 0.296156 0.040719 0.266291 0.290531 0.320242 NaN Frank Robinson 0.290568 0.039773 0.264530 0.284151 0.312349 NaN Frank Howard 0.287571 0.039618 0.261698 0.280873 0.308512 NaN Jay Johnstone 0.281311 0.036731 0.254630 0.278680 0.301954 NaN Ken Berry 0.275172 0.034034 0.251361 0.273174 0.296399 NaN Jim Spencer 0.275183 0.038243 0.248527 0.271953 0.294773 NaN Don Kessinger 0.269896 0.034686 0.246947 0.265581 0.290342 NaN Luis Alvarado 0.265240 0.035537 0.244533 0.264933 0.287217 NaN Ron Santo 0.259697 0.034639 0.239275 0.262939 0.279452 NaN Ron Swaboda 0.260685 0.035905 0.235296 0.259989 0.281871 NaN Rico Petrocelli 0.253204 0.034641 0.232819 0.255023 0.273528 NaN Ellie Rodriguez 0.254260 0.031659 0.235744 0.254733 0.273609 NaN George Scott 0.253762 0.032302 0.234679 0.255867 0.275410 NaN Del Unser 0.255131 0.031011 0.237398 0.257930 0.274724 NaN Billy Williams 0.253935 0.032343 0.235494 0.254280 0.274392 NaN Bert Campaneris 0.249635 0.037554 0.228033 0.251631 0.272752 NaN Thurman Munson 0.243726 0.035686 0.222682 0.246175 0.265775 NaN Max Alvis 0.240035 0.036551 0.218575 0.244242 0.265604 NaN r_hat Roberto Clemente 1.013094 Frank Robinson 1.026909 Frank Howard 1.026840 Jay Johnstone 1.012468 Ken Berry 1.004120 Jim Spencer 1.003538 Don Kessinger 0.998158 Luis Alvarado 0.995409 Ron Santo 0.998881 Ron Swaboda 1.007532 Rico Petrocelli 0.998102 Ellie Rodriguez 0.999133 George Scott 0.998382 Del Unser 1.008240 Billy Williams 1.000821 Bert Campaneris 0.995667 Thurman Munson 1.002400 Max Alvis 1.012194 Posterior Predictive: Hit Rate - Initial 45 At Bats ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 18 Frank Robinson 17.0 0.0 17.0 17.0 17.0 17 Frank Howard 16.0 0.0 16.0 16.0 16.0 16 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 15 Ken Berry 14.0 0.0 14.0 14.0 14.0 14 Jim Spencer 14.0 0.0 14.0 14.0 14.0 14 Don Kessinger 13.0 0.0 13.0 13.0 13.0 13 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 12 Ron Santo 11.0 0.0 11.0 11.0 11.0 11 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 11 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 10 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 10 George Scott 10.0 0.0 10.0 10.0 10.0 10 Del Unser 10.0 0.0 10.0 10.0 10.0 10 Billy Williams 10.0 0.0 10.0 10.0 10.0 10 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 9 Thurman Munson 8.0 0.0 8.0 8.0 8.0 8 Max Alvis 7.0 0.0 7.0 7.0 7.0 7 Hit Rate - Season Predictions ----------------------------- mean std 25% 50% 75% ActualHits Roberto Clemente 18.0 0.0 18.0 18.0 18.0 145 Frank Robinson 17.0 0.0 17.0 17.0 17.0 144 Frank Howard 16.0 0.0 16.0 16.0 16.0 160 Jay Johnstone 15.0 0.0 15.0 15.0 15.0 76 Ken Berry 14.0 0.0 14.0 14.0 14.0 128 Jim Spencer 14.0 0.0 14.0 14.0 14.0 140 Don Kessinger 13.0 0.0 13.0 13.0 13.0 168 Luis Alvarado 12.0 0.0 12.0 12.0 12.0 41 Ron Santo 11.0 0.0 11.0 11.0 11.0 148 Ron Swaboda 11.0 0.0 11.0 11.0 11.0 57 Rico Petrocelli 10.0 0.0 10.0 10.0 10.0 152 Ellie Rodriguez 10.0 0.0 10.0 10.0 10.0 52 George Scott 10.0 0.0 10.0 10.0 10.0 142 Del Unser 10.0 0.0 10.0 10.0 10.0 83 Billy Williams 10.0 0.0 10.0 10.0 10.0 205 Bert Campaneris 9.0 0.0 9.0 9.0 9.0 168 Thurman Munson 8.0 0.0 8.0 8.0 8.0 137 Max Alvis 7.0 0.0 7.0 7.0 7.0 21 Log posterior predictive density -------------------------------- -52.6788
レポートの可視化(stan、pymcのforestplotのようなもの)
ハイパーパラメータの分布のプロット
Centered Parameterization, Non-Centered Parameterizationの比較(plot)
stanの例の続き
poutineとは統計モデルをデータを結びつけるための関数でwith構文にしたりデコレーターとしたりといろいろな使い方ができる。少し古いversionではこれを直接使ってMCMC用のkernelを作っていた。 http://docs.pyro.ai/en/0.2.1-release/poutine.html
def conditioned_model(model, at_bats, hits):
return poutine.condition(model, data={"obs": hits})(at_bats)