import numpy as np import matplotlib.pyplot as plt # For preprocessing from sklearn.preprocessing import StandardScaler from sklearn_pandas import DataFrameMapper import torch # For building the networks import torchtuples as tt # Some useful functions from pycox.datasets import metabric from pycox.models import PCHazard from pycox.evaluation import EvalSurv
## Uncomment to install `sklearn-pandas` # ! pip install sklearn-pandas
np.random.seed(1234) _ = torch.manual_seed(123)
We load the METABRIC data set as a pandas DataFrame and split the data in in train, test and validation.
duration column gives the observed times and the
event column contains indicators of whether the observation is an event (1) or a censored observation (0).
df_train = metabric.read_df() df_test = df_train.sample(frac=0.2) df_train = df_train.drop(df_test.index) df_val = df_train.sample(frac=0.2) df_train = df_train.drop(df_val.index)
The METABRIC dataset has 9 covariates:
x0, ..., x8.
We will standardize the 5 numerical covariates, and leave the binary covariates as is.
Note that PyTorch require variables of type
We like using the
sklearn_pandas.DataFrameMapper to make feature mappers.
cols_standardize = ['x0', 'x1', 'x2', 'x3', 'x8'] cols_leave = ['x4', 'x5', 'x6', 'x7'] standardize = [([col], StandardScaler()) for col in cols_standardize] leave = [(col, None) for col in cols_leave] x_mapper = DataFrameMapper(standardize + leave)
x_train = x_mapper.fit_transform(df_train).astype('float32') x_val = x_mapper.transform(df_val).astype('float32') x_test = x_mapper.transform(df_test).astype('float32')
The survival methods require individual label transforms, so we have included a proposed
label_transform for each method.
In this case
label_transform is just a shorthand for the class
PCHazard is a continuous-time method, but requires defined intervals in which the hazard is constant. Hence, we need to perform an operation similar to a discretization of the time scale.
num_durations define the size of this (equidistant) interval grid.
num_durations = 10 labtrans = PCHazard.label_transform(num_durations) get_target = lambda df: (df['duration'].values, df['event'].values) y_train = labtrans.fit_transform(*get_target(df_train)) y_val = labtrans.transform(*get_target(df_val)) train = (x_train, y_train) val = (x_val, y_val) # We don't need to transform the test labels durations_test, events_test = get_target(df_test)
y_train now consist of three labels: the interval index, the event indicator, and the proportion of the interval before the event/censoring occur (i.e, $\rho(t_i)$ in the paper).
(array([2, 2, 6, ..., 1, 5, 3]), array([0., 1., 0., ..., 1., 0., 0.], dtype=float32), array([0.7965465 , 0.69519496, 0.7370492 , ..., 0.06606603, 0.5865238 , 0.96302533], dtype=float32))
We make a neural net with
For simple network structures, we can use the
MLPVanilla provided by
For building more advanced network architectures, see for example the tutorials by PyTroch.
The following net is an MLP with two hidden layers (with 32 nodes each), ReLU activations, and
num_nodes output nodes.
We also have batch normalization and dropout between the layers.
in_features = x_train.shape num_nodes = [32, 32] out_features = labtrans.out_features batch_norm = True dropout = 0.1 net = tt.practical.MLPVanilla(in_features, num_nodes, out_features, batch_norm, dropout)
If you instead want to build this network with
torch you can uncomment the following code.
It is essentially equivalent to the
MLPVanilla, but without the
torch.nn.init.kaiming_normal_ weight initialization.
# net = torch.nn.Sequential( # torch.nn.Linear(in_features, 32), # torch.nn.ReLU(), # torch.nn.BatchNorm1d(32), # torch.nn.Dropout(0.1), # torch.nn.Linear(32, 32), # torch.nn.ReLU(), # torch.nn.BatchNorm1d(32), # torch.nn.Dropout(0.1), # torch.nn.Linear(32, out_features) # )
To train the model we need to define an optimizer. You can choose any
torch.optim optimizer, but here we instead use one from
tt.optim as it has some added functionality.
We use the
Adam optimizer, but instead of choosing a learning rate, we will use the scheme proposed by Smith 2017 to find a suitable learning rate with
model.lr_finder. See this post for an explanation.
We also set
duration_index which connects the output nodes of the network the the discretization times. This is only useful for prediction and does not affect the training procedure.
model = PCHazard(net, tt.optim.Adam, duration_index=labtrans.cuts)
batch_size = 256 lr_finder = model.lr_finder(x_train, y_train, batch_size, tolerance=8) _ = lr_finder.plot()
Often, this learning rate is a little high, so we instead set it manually to 0.01
We include the
EarlyStopping callback to stop training when the validation loss stops improving. After training, this callback will also load the best performing model in terms of validation loss.
epochs = 100 callbacks = [tt.callbacks.EarlyStopping()] log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val)
0: [0s / 0s], train_loss: 2.6413, val_loss: 2.5174 1: [0s / 0s], train_loss: 2.3406, val_loss: 2.3192 2: [0s / 0s], train_loss: 2.1494, val_loss: 2.0976 3: [0s / 0s], train_loss: 1.9278, val_loss: 1.8521 4: [0s / 0s], train_loss: 1.7077, val_loss: 1.6468 5: [0s / 0s], train_loss: 1.5645, val_loss: 1.5185 6: [0s / 0s], train_loss: 1.4957, val_loss: 1.4789 7: [0s / 0s], train_loss: 1.4887, val_loss: 1.4919 8: [0s / 0s], train_loss: 1.4821, val_loss: 1.4874 9: [0s / 0s], train_loss: 1.4499, val_loss: 1.4807 10: [0s / 0s], train_loss: 1.4514, val_loss: 1.4747 11: [0s / 0s], train_loss: 1.4425, val_loss: 1.4793 12: [0s / 0s], train_loss: 1.4182, val_loss: 1.4848 13: [0s / 0s], train_loss: 1.4268, val_loss: 1.4878 14: [0s / 0s], train_loss: 1.4193, val_loss: 1.4861 15: [0s / 0s], train_loss: 1.4099, val_loss: 1.4978 16: [0s / 0s], train_loss: 1.3999, val_loss: 1.5006 17: [0s / 0s], train_loss: 1.4148, val_loss: 1.4996 18: [0s / 0s], train_loss: 1.4060, val_loss: 1.5080 19: [0s / 0s], train_loss: 1.3819, val_loss: 1.5106 20: [0s / 0s], train_loss: 1.3740, val_loss: 1.5111
_ = log.plot()
For evaluation we first need to obtain survival estimates for the test set.
This can be done with
model.predict_surv which returns an array of survival estimates, or with
model.predict_surv_df which returns the survival estimates as a dataframe.
However, we need to define at how many points we want to get the predictions.
The default (
model.sub = 1) is just to use the interval knots, but by increasing the
model.sub argument, we replace the knots with and equidistant number of points in each interval.
This is very similar to the interpolation of the discrete methods such as
surv = model.predict_surv_df(x_test)
surv.iloc[:, :5].plot(drawstyle='steps-post') plt.ylabel('S(t | x)') _ = plt.xlabel('Time')
model.sub = 10
surv = model.predict_surv_df(x_test) surv.iloc[:, :5].plot(drawstyle='steps-post') plt.ylabel('S(t | x)') _ = plt.xlabel('Time')
EvalSurv class contains some useful evaluation criteria for time-to-event prediction.
censor_surv = 'km' to state that we want to use Kaplan-Meier for estimating the censoring distribution.
ev = EvalSurv(surv, durations_test, events_test, censor_surv='km')
time_grid = np.linspace(durations_test.min(), durations_test.max(), 100) ev.brier_score(time_grid).plot() plt.ylabel('Brier score') _ = plt.xlabel('Time')
ev.nbll(time_grid).plot() plt.ylabel('NBLL') _ = plt.xlabel('Time')