Book Notes: Computer Age Statistical Inference -- Ch9 Survival Analysis

For the pdf slides, click here

Survival Analysis

Life Table and Kaplan-Meier Estimate

Life table

  • An insurance company’s life table shows information of clients by their age. For each age \(i\), it contains

    • \(n_i\): number of clients
    • \(y_i\): number of death
    • \(\hat{h}_i = y_i / n_i\): hazard rate
    • \(\hat{S}_i\): survival probability estimate
  • An example life table

Age \(n\) \(y\) \(\hat{h}\) \(\hat{S}\)
34 120 0 0.000 1.000
35 71 1 0.014 0.986
36 125 0 0.000 0.986

Discrete survival analysis: notations

  • A client’s lifetime (time until event): random variable \(X\)
    • Also called failure time, survival time, or event time
  • Probability of dying at age \(i\) \[ f_i = P(X = i) \]

  • Probability of surviving past age \(i\) \[ S_i = \sum_{j \geq i + 1} f_j = P(X > i) \]

  • Hazard rate at age \(i\): conditional probability \[ h_i = \frac{f_i}{S_{i-1}} = P(X = i \mid X \geq i) \]

Life table estimations

  • Hazard rate estimation: binomial proportions \[ \hat{h}_i = \frac{y_i}{n_i} \]
    • Typical frequentist inference: probabilistic results \(h_i\) is estimated by the plug-in principle
  • Probability of surviving past age \(j\) given survival past age \(i\): \[ P(X > j \mid X > i) = \prod_{k = i+1}^j P(X > k \mid X \geq k) = \prod_{k = i+1}^j (1 - h_k) \]

  • Probability of survival estimation \[ \hat{S}_j = \prod_{k={i_0}}^j \left( 1 - \hat{h}_k\right) \] where \(i_0\) is the starting age

Continuous survival analysis: notations

  • Time until event \(T\): a continuous positive random variable, with pdf \(f(t)\) and cdf \(F(t)\)

  • Survival function (i.e., reverse cdf) \[ S(t) = \int_{t}^{\infty} f(x) dx = P(T > t) = 1- F(t) \]

  • Hazard rate, also called hazard function \[ h(t) = \frac{f(t)}{S(t)} = \lim_{\Delta t \rightarrow 0} \frac{P(t < T \leq t + \Delta t \mid T > t)}{\Delta t} \]
    • In some other books, hazard rate is denoted as \(\lambda(t)\)

Hazard rate and cumulative hazard function

  • Connection between hazard rate \(h(t)\) and survival function \(S(t)\) \[ h(t) = -\frac{\partial \log S(t)}{\partial t} \quad \Longleftrightarrow \quad S(t) = \exp\left\{ -\int_0^t h(x)dx \right\} \]

  • Cumulative hazard function \[ \Lambda(t) = \int_0^t h(x) dx = -\log S(t) \]

  • Knowing any of \(S(t)\), \(h(t)\), \(\Lambda(t)\) allows one to derive the other two

  • Example: exponential distributed \(T\) \[ f(t) = \lambda e^{- \lambda t} \quad \Longrightarrow \quad S(t) = e^{-\lambda t}, \quad h(t) = \lambda \]
    • Constant hazard rate: menoryless

Censored data

  • Censored data: survival times known only to exceed the reported value
    • E.g., lost to followup, experiment ended with some patients still alive
    • Usually denoted as “number+”
  • Observation \(z_i\) for censored data: \[ z = (t_i, d_i), \] where \(t_i\) is the survival time, and \(d_i\) is the indicator \[ d_i = \begin{cases} 1 & \text{if death observed}\\ 0 & \text{if death not observed} \end{cases} \]

Kaplan-Meier estimate

  • Among the censored data \(z_1, \ldots, z_n\), we denote the ordered survival times as \[ t_{(1)} < t_{(2)} < \ldots < t_{(n)}, \] assuming no ties.

  • The Kaplan-Meier estimate for survival probability \(S_{(j)} = P(X > t_{(j)})\) is the life table estimate \[ \hat{S}_{(j)} = \prod_{k \leq j} \left( \frac{n-k}{n-k+1} \right)^{d_{(k)}} \]

  • Life table curves are nonparametric: no relationship is assumed between the hazard rates \(h_i\)

A parametric approach

  • Death counts \(y_k\) are independent Binomials \[ y_k \stackrel{ind}{\sim} \text{B}(n_k, h_k) \]

  • Logistic regression \[ log\left( \frac{h_k}{1-h_k} \right) = \boldsymbol\alpha \mathbf{x}_k \]

    • E.g., cubic regression: \[ x_k = (1, k, k^2, k^3)' \]

    • E.g., cubic-linear spline: \[ x_k = (1, k, (k - k_0)_-^2, (k - k_0)_-^3)' \] where \(x_- = x \cdot \mathbf{1}_{x \leq 0}\)

Cox’s Proportional Hazards Model

Cox’s proportional hazards model

  • Proportional hazards model assumes \[ h_i(t) = h_0(t) \cdot e^{\mathbf{x}_i' \boldsymbol\beta}, \] where \(h_0(t)\) is a baseline hazard, which we don’t need to specify

  • Denote \(\theta_i = e^{\mathbf{x}_i' \boldsymbol\beta}\), then \[ S_i(t) = S_0(t)^{\theta_i}, \] where \(S_0(t)\) is the baseline survival function

    • Larger value of \(\theta_i\) indicates more quickly declining (i.e., worse) survival curves
    • Positive value of the coefficient \(\beta_j\) indicates increase of the corresponding covariate \(x_j\) associating with worse survival curves

Proportional hazards model: key results

  • Let \(J\) be the number of observed deaths, occurring at times \[ T_{(1)} < T_{(2)} < \ldots < T_{(J)} \] assuming no ties

  • Just before time \(T_{(j)}\) there is a risk set of individuals still under observation \[ R_j = \{i, t_i \geq T_{(j)}\} \]

  • Key results of the proportional hazards model: given one person dies at time \(T_{(j)}\), the probablity it is person \(i\), among the set of people at risk, is \[ P(i_j = i \mid R_j) = \frac{e^{\mathbf{x}_i' \boldsymbol\beta}} {\sum_{k \in R_j} e^{\mathbf{x}_j' \boldsymbol\beta}} = \frac{\theta_i}{\sum_{k \in R_j} \theta_j} \]

Parameter estimation: based on the partial likelihood

  • Estimaiton of \(\boldsymbol\beta\) is to maximize the partial likelihood \[ L(\boldsymbol\beta) = \prod_{j=1}^J \frac{e^{\mathbf{x}_{i_j}' \boldsymbol\beta}} {\sum_{k \in R_j} e^{\mathbf{x}_j' \boldsymbol\beta}} \] where individual \(i_j\) dies at time \(T_{(j)}\)

  • Semi-parametric: we do not need to specify the baseline \(h_0(t)\), since it is not contained in the objective function


  • Efron, Bradley and Hastie, Trevor (2016), Computer Age Statistical Inference. Cambridge University Press