In this notebook we introduce the use of
pycox through an example dataset.
We illustrate the procedure with the
LogisticHazard method (paper_link), but we this can easily be replaced by for example
In the following we will:
While some knowledge of the PyTorch framework is preferable, it is not required for the use of simple neural networks. For building more advanced network architectures, however, we would recommend looking at the PyTorch tutorials.
# ! pip install sklearn-pandas
import numpy as np import matplotlib.pyplot as plt # For preprocessing from sklearn.preprocessing import StandardScaler from sklearn_pandas import DataFrameMapper
import torch # For building the networks import torchtuples as tt # Some useful functions
from pycox.datasets import metabric from pycox.models import LogisticHazard # from pycox.models import PMF # from pycox.models import DeepHitSingle from pycox.evaluation import EvalSurv
# We also set some seeds to make this reproducable. # Note that on gpu, there is still some randomness. np.random.seed(1234) _ = torch.manual_seed(123)
We load the METABRIC data set as a pandas DataFrame and split the data in in train, test and validation.
duration column gives the observed times and the
event column contains indicators of whether the observation is an event (1) or a censored observation (0).
df_train = metabric.read_df() df_test = df_train.sample(frac=0.2) df_train = df_train.drop(df_test.index) df_val = df_train.sample(frac=0.2) df_train = df_train.drop(df_val.index)
The METABRIC dataset has 9 covariates:
x0, ..., x8.
We will standardize the 5 numerical covariates, and leave the binary covariates as is.
Note that PyTorch require variables of type
Here we use the
sklearn_pandas.DataFrameMapper to make feature mappers, but this has nothing to do the the
cols_standardize = ['x0', 'x1', 'x2', 'x3', 'x8'] cols_leave = ['x4', 'x5', 'x6', 'x7'] standardize = [([col], StandardScaler()) for col in cols_standardize] leave = [(col, None) for col in cols_leave] x_mapper = DataFrameMapper(standardize + leave)
x_train = x_mapper.fit_transform(df_train).astype('float32') x_val = x_mapper.transform(df_val).astype('float32') x_test = x_mapper.transform(df_test).astype('float32')
The survival methods require individual label transforms, so we have included a proposed
label_transform for each method.
For most of them the
label_transform is just a shorthand for the class
LogisticHazard is a discrete-time method, meaning it requires discretization of the event times to be applied to continuous-time data.
num_durations define the size of this (equidistant) discretization grid, meaning our network will have
num_durations output nodes.
num_durations = 10 labtrans = LogisticHazard.label_transform(num_durations) # labtrans = PMF.label_transform(num_durations) # labtrans = DeepHitSingle.label_transform(num_durations) get_target = lambda df: (df['duration'].values, df['event'].values) y_train = labtrans.fit_transform(*get_target(df_train)) y_val = labtrans.transform(*get_target(df_val)) train = (x_train, y_train) val = (x_val, y_val) # We don't need to transform the test labels durations_test, events_test = get_target(df_test)
labtrans.cuts contains the discretization grid. This will later be used to obtain the right time-scale for survival predictions.
array([ 0. , 39.466667, 78.933334, 118.4 , 157.86667 , 197.33334 , 236.8 , 276.26666 , 315.73334 , 355.2 ], dtype=float32)
y_train is a tuple with the indices of the discretized times, in addition to event indicators.
(array([2, 3, 6, ..., 1, 5, 3]), array([0., 1., 0., ..., 1., 0., 0.], dtype=float32))
array([ 78.933334, 118.4 , 236.8 , ..., 39.466667, 197.33334 , 118.4 ], dtype=float32)
We make a neural net with
For simple network structures, we can use the
MLPVanilla provided by
For building more advanced network architectures, see for example the tutorials by PyTroch.
The following net is an MLP with two hidden layers (with 32 nodes each), ReLU activations, and
out_features output nodes.
We also have batch normalization and dropout between the layers.
in_features = x_train.shape num_nodes = [32, 32] out_features = labtrans.out_features batch_norm = True dropout = 0.1 net = tt.practical.MLPVanilla(in_features, num_nodes, out_features, batch_norm, dropout)
If you instead want to build this network with
torch you can uncomment the following code.
It is essentially equivalent to the
MLPVanilla, but without the
torch.nn.init.kaiming_normal_ weight initialization.
# net = torch.nn.Sequential( # torch.nn.Linear(in_features, 32), # torch.nn.ReLU(), # torch.nn.BatchNorm1d(32), # torch.nn.Dropout(0.1), # torch.nn.Linear(32, 32), # torch.nn.ReLU(), # torch.nn.BatchNorm1d(32), # torch.nn.Dropout(0.1), # torch.nn.Linear(32, out_features) # )
To train the model we need to define an optimizer. You can choose any
torch.optim optimizer, or use one from
The latter is built on top of the
torch optimizers, but with some added functionality (such as not requiring
net.parameters() as input and the
model.lr_finder for finding suitable learning rates).
We will here use the
Adam optimizer with learning rate 0.01.
We also set
duration_index which connects the output nodes of the network the the discretization times. This is only useful for prediction and does not affect the training procedure.
LogisticHazard can also take the argument
device which can be use to choose between running on the CPU and GPU. The default behavior is to run on a GPU if it is available, and CPU if not.
?LogisticHazard for details.
model = LogisticHazard(net, tt.optim.Adam(0.01), duration_index=labtrans.cuts) # model = PMF(net, tt.optim.Adam(0.01), duration_index=labtrans.cuts) # model = DeepHitSingle(net, tt.optim.Adam(0.01), duration_index=labtrans.cuts)
Next, we set the
batch_size and the number of training
We also include the
EarlyStopping callback to stop training when the validation loss stops improving.
batch_size = 256 epochs = 100 callbacks = [tt.cb.EarlyStopping()]
We can now train the network and the
log object keeps track of the training progress.
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, val_data=val)
0: [0s / 0s], train_loss: 2.9525, val_loss: 2.7795 1: [0s / 0s], train_loss: 2.6598, val_loss: 2.5419 2: [0s / 0s], train_loss: 2.4027, val_loss: 2.2536 3: [0s / 0s], train_loss: 2.0897, val_loss: 1.9271 4: [0s / 0s], train_loss: 1.7825, val_loss: 1.6382 5: [0s / 0s], train_loss: 1.5542, val_loss: 1.4681 6: [0s / 0s], train_loss: 1.4341, val_loss: 1.4088 7: [0s / 0s], train_loss: 1.4004, val_loss: 1.3898 8: [0s / 0s], train_loss: 1.4069, val_loss: 1.3798 9: [0s / 0s], train_loss: 1.3830, val_loss: 1.3746 10: [0s / 0s], train_loss: 1.3433, val_loss: 1.3715 11: [0s / 0s], train_loss: 1.3466, val_loss: 1.3710 12: [0s / 0s], train_loss: 1.3187, val_loss: 1.3682 13: [0s / 0s], train_loss: 1.3268, val_loss: 1.3692 14: [0s / 0s], train_loss: 1.3174, val_loss: 1.3669 15: [0s / 0s], train_loss: 1.3167, val_loss: 1.3642 16: [0s / 0s], train_loss: 1.3138, val_loss: 1.3642 17: [0s / 0s], train_loss: 1.3035, val_loss: 1.3680 18: [0s / 0s], train_loss: 1.2987, val_loss: 1.3694 19: [0s / 1s], train_loss: 1.2896, val_loss: 1.3761 20: [0s / 1s], train_loss: 1.2905, val_loss: 1.3760 21: [0s / 1s], train_loss: 1.3070, val_loss: 1.3732 22: [0s / 1s], train_loss: 1.2847, val_loss: 1.3733 23: [0s / 1s], train_loss: 1.2907, val_loss: 1.3821 24: [0s / 1s], train_loss: 1.2582, val_loss: 1.3905 25: [0s / 1s], train_loss: 1.2672, val_loss: 1.3977 26: [0s / 1s], train_loss: 1.2636, val_loss: 1.3996
_ = log.plot()
After termination, the
EarlyStopping callback loads the best performing model (in terms of validation loss).
We can verify this by comparing the minimum validation loss to the validation score of the trained
For evaluation we first need to obtain survival estimates for the test set.
This can be done with
model.predict_surv which returns an array of survival estimates, or with
model.predict_surv_df which returns the survival estimates as a dataframe.
surv = model.predict_surv_df(x_test)
We can plot the survival estimates for the first 5 individuals.
Note that the time scale is correct because we have set
model.duration_index to be the grid points.
We have, however, only defined the survival estimates at the 10 times in our discretization grid, so, the survival estimates is a step function
surv.iloc[:, :5].plot(drawstyle='steps-post') plt.ylabel('S(t | x)') _ = plt.xlabel('Time')
It is, therefore, often beneficial to interpolate the survival estimates.
Linear interpolation (constant density interpolation) can be performed with the
interpolate method. We also need to choose how many points we want to replace each grid point with. Her we will use 10.
surv = model.interpolate(10).predict_surv_df(x_test)
surv.iloc[:, :5].plot(drawstyle='steps-post') plt.ylabel('S(t | x)') _ = plt.xlabel('Time')
EvalSurv class contains some useful evaluation criteria for time-to-event prediction.
censor_surv = 'km' to state that we want to use Kaplan-Meier for estimating the censoring distribution.
ev = EvalSurv(surv, durations_test, events_test, censor_surv='km')
time_grid = np.linspace(durations_test.min(), durations_test.max(), 100) ev.brier_score(time_grid).plot() plt.ylabel('Brier score') _ = plt.xlabel('Time')
ev.nbll(time_grid).plot() plt.ylabel('NBLL') _ = plt.xlabel('Time')
You can now look at other examples of survival methods in the examples folder. Or, alternatively take a look at