DeepHit with Competing Risks¶

In this notebook we give an example of how to apply the DeepHit method for competing risks.

The pycox package has (so far) limited support for competing-risk data, so the evaluation procedure at the end is somewhat limited.

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torchtuples as tt

from pycox.preprocessing.label_transforms import LabTransDiscreteTime
from pycox.models import DeepHit
from pycox.evaluation import EvalSurv

In [2]:

np.random.seed(1234)
_ = torch.manual_seed(1234)

Dataset from the DeepHit repo¶

We download a competing risk data set from the DeepHit authors repo. The dataset is from a simulation study with two event types and censored observations.

We split in train, val and test.

In [3]:

url = 'https://raw.githubusercontent.com/chl8856/DeepHit/master/sample%20data/SYNTHETIC/synthetic_comprisk.csv'
df_train = pd.read_csv(url)
df_test = df_train.sample(frac=0.2)
df_train = df_train.drop(df_test.index)
df_val = df_train.sample(frac=0.2)
df_train = df_train.drop(df_val.index)

In [4]:

df_train.head()

Out[4]:

	time	label	true_time	true_label	feature1	feature2	feature3	feature4	feature5	feature6	feature7	feature8	feature9	feature10	feature11	feature12
1	1	0	1	1	0.015579	-0.84608	0.48753	0.65193	0.20099	-0.11238	-1.39630	-0.188740	-0.30001	-0.24032	-0.38533	-1.02450
2	34	2	34	2	0.446490	1.64100	-1.74500	0.31795	-1.14060	0.36560	0.28110	-0.582530	-1.69070	1.20220	-0.51920	1.78400
3	9	0	9	2	0.629460	-0.61575	-0.32345	-0.90020	0.45360	-0.61992	2.16240	0.198750	-1.11960	-2.73210	-0.25673	-0.81836
5	11	2	11	2	0.487010	0.52086	1.99370	-0.94736	0.24371	1.06550	0.57686	0.019192	0.23212	0.48023	-0.73096	1.43960
6	37	0	40	2	-1.183700	-0.31602	-0.58640	-0.53890	-1.15830	1.04010	0.61938	-0.415420	-0.50700	-2.18300	0.97320	0.97753

Feature transforms¶

The covariates are standardized, so we don't need to any preprocessing.

In [5]:

get_x = lambda df: (df
                    .drop(columns=['time', 'label', 'true_time', 'true_label'])
                    .values.astype('float32'))

In [6]:

x_train = get_x(df_train)
x_val = get_x(df_val)
x_test = get_x(df_test)

Label transforms¶

Currently, the pycox package is mainly focused on single-event data, so there is no dedicated label transformer for competing risks. So, we make a simple one on our own, based on the the transform for the discrete methods LabTransDiscreteSurv.

The class returns durations (as integers) and event types (as integers) where 0 is reserved for censored observations.

We discretize the data to num_durations time points, which can be found with labtrans.cuts.

In [7]:

class LabTransform(LabTransDiscreteTime):
    def transform(self, durations, events):
        durations, is_event = super().transform(durations, events > 0)
        events[is_event == 0] = 0
        return durations, events.astype('int64')

In [8]:

num_durations = 10
labtrans = LabTransform(num_durations)
get_target = lambda df: (df['time'].values, df['label'].values)

In [9]:

y_train = labtrans.fit_transform(*get_target(df_train))
y_val = labtrans.transform(*get_target(df_val))
durations_test, events_test = get_target(df_test)
val = (x_val, y_val)

In [10]:

y_train[0][:6], y_train[1][:6]

Out[10]:

(array([0, 2, 0, 1, 1, 1]), array([0, 2, 0, 2, 0, 1]))

In [11]:

labtrans.cuts

Out[11]:

array([  0.        ,  21.33333333,  42.66666667,  64.        ,
        85.33333333, 106.66666667, 128.        , 149.33333333,
       170.66666667, 192.        ])

Network architecture¶

Under, we describe two networks that can be for competing risks:

The first, SimpleMLP, is a regular MLP which outputs a [batch_size x num_risks x num_durations], e.g., [64, 2, 10] tensor.

The other, CauseSpecificNet is similar to the cause-specific network described in the DeepHit paper. It has the same output shape as the SimpleMLP, but is a little more complex to make.

In [12]:

class SimpleMLP(torch.nn.Module):
    """Simple network structure for competing risks.
    """
    def __init__(self, in_features, num_nodes, num_risks, out_features, batch_norm=True,
                 dropout=None):
        super().__init__()
        self.num_risks = num_risks
        self.mlp = tt.practical.MLPVanilla(
            in_features, num_nodes, num_risks * out_features,
            batch_norm, dropout,
        )
        
    def forward(self, input):
        out = self.mlp(input)
        return out.view(out.size(0), self.num_risks, -1)

In [13]:

class CauseSpecificNet(torch.nn.Module):
    """Network structure similar to the DeepHit paper, but without the residual
    connections (for simplicity).
    """
    def __init__(self, in_features, num_nodes_shared, num_nodes_indiv, num_risks,
                 out_features, batch_norm=True, dropout=None):
        super().__init__()
        self.shared_net = tt.practical.MLPVanilla(
            in_features, num_nodes_shared[:-1], num_nodes_shared[-1],
            batch_norm, dropout,
        )
        self.risk_nets = torch.nn.ModuleList()
        for _ in range(num_risks):
            net = tt.practical.MLPVanilla(
                num_nodes_shared[-1], num_nodes_indiv, out_features,
                batch_norm, dropout,
            )
            self.risk_nets.append(net)

    def forward(self, input):
        out = self.shared_net(input)
        out = [net(out) for net in self.risk_nets]
        out = torch.stack(out, dim=1)
        return out

In [14]:

in_features = x_train.shape[1]
num_nodes_shared = [64, 64]
num_nodes_indiv = [32]
num_risks = y_train[1].max()
out_features = len(labtrans.cuts)
batch_norm = True
dropout = 0.1

# net = SimpleMLP(in_features, num_nodes_shared, num_risks, out_features)
net = CauseSpecificNet(in_features, num_nodes_shared, num_nodes_indiv, num_risks,
                       out_features, batch_norm, dropout)

Training¶

We fit the net with the AdamWR cyclic optimizer with initial learning rate of 0.01, decoupled_weight_decay of 0.01, initial cycle length of 1 epoch, and multiply the learning rate with 0.8 and cycle length with 2 at each new cycle.

The hyperparameters in the DeepHit loss-function, alpha and sigma, are set to 0.2 and 0.1, respectively. Note that alpha here controls the convex combination of the two losses, $$\text{loss} = \alpha \text{loss}_\text{NLL} + (1 - \alpha) \text{loss}_\text{rank},$$ and therefore has a different interpretation than in the DeepHit paper.

In [15]:

optimizer = tt.optim.AdamWR(lr=0.01, decoupled_weight_decay=0.01,
                            cycle_eta_multiplier=0.8)
model = DeepHit(net, optimizer, alpha=0.2, sigma=0.1,
                duration_index=labtrans.cuts)

In [16]:

epochs = 512
batch_size = 256
callbacks = [tt.callbacks.EarlyStoppingCycle()]
verbose = False # set to True if you want printout

In [17]:

%%time
log = model.fit(x_train, y_train, batch_size, epochs, callbacks, verbose, val_data=val)

CPU times: user 3min 53s, sys: 19.9 s, total: 4min 13s
Wall time: 2min 2s

In [18]:

_ = log.plot()

Evaluation¶

There is, currently, limited support for competing risks in pycox, so the evaluation here is just illustrative and should not be considered best practice.

The survival function obtained with predict_surv_df is the probability of surviving any of the events, and does, therefore, not distinguish between the event types. This means that we evaluate this "single-event case" as before.

In [19]:

surv = model.predict_surv_df(x_test)
ev = EvalSurv(surv, durations_test, events_test != 0, censor_surv='km')

In [20]:

ev.concordance_td()

Out[20]:

0.7202672608491353

In [21]:

ev.integrated_brier_score(np.linspace(0, durations_test.max(), 100))

Out[21]:

0.10413137528575585

The cumulative incidence function¶

The cumulative incidence function, or CIF, is commonly used in settings with competing risks. We can evaluate the case-specific concordance using these CIF's.

We use 1 - cif here because the CIF increases with with risk (while the survival function decreases with higher risk).

In [22]:

cif = model.predict_cif(x_test)
cif1 = pd.DataFrame(cif[0], model.duration_index)
cif2 = pd.DataFrame(cif[1], model.duration_index)

In [23]:

ev1 = EvalSurv(1-cif1, durations_test, events_test == 1, censor_surv='km')
ev2 = EvalSurv(1-cif2, durations_test, events_test == 2, censor_surv='km')

In [24]:

ev1.concordance_td()

Out[24]:

0.7150254240668652

In [25]:

ev2.concordance_td()

Out[25]:

0.712722920851343

Plot CIF¶

Finally, we plot six random individuals and their cumulative incidence functions

In [26]:

sample = np.random.choice(len(durations_test), 6)
fig, axs = plt.subplots(2, 3, True, True, figsize=(10, 5))
for ax, idx in zip(axs.flat, sample):
    pd.DataFrame(cif.transpose()[idx], index=labtrans.cuts).plot(ax=ax)
    ax.set_ylabel('CIF')
    ax.set_xlabel('Time')
    ax.grid(linestyle='--')

In [ ]: