Book Notes: Computer Age Statistical Inference -- Ch9 Survival Analysis

For the pdf slides, click here

Survival Analysis

Life Table and Kaplan-Meier Estimate

Life table

  • An insurance company’s life table shows information of clients by their age. For each age i, it contains

    • ni: number of clients
    • yi: number of death
    • ˆhi=yi/ni: hazard rate
    • ˆSi: survival probability estimate
  • An example life table

Age n y ˆh ˆS
34 120 0 0.000 1.000
35 71 1 0.014 0.986
36 125 0 0.000 0.986

Discrete survival analysis: notations

  • A client’s lifetime (time until event): random variable X
    • Also called failure time, survival time, or event time
  • Probability of dying at age i fi=P(X=i)

  • Probability of surviving past age i Si=ji+1fj=P(X>i)

  • Hazard rate at age i: conditional probability hi=fiSi1=P(X=iXi)

Life table estimations

  • Hazard rate estimation: binomial proportions ˆhi=yini
    • Typical frequentist inference: probabilistic results hi is estimated by the plug-in principle
  • Probability of surviving past age j given survival past age i: P(X>jX>i)=jk=i+1P(X>kXk)=jk=i+1(1hk)

  • Probability of survival estimation ˆSj=jk=i0(1ˆhk) where i0 is the starting age

Continuous survival analysis: notations

  • Time until event T: a continuous positive random variable, with pdf f(t) and cdf F(t)

  • Survival function (i.e., reverse cdf) S(t)=tf(x)dx=P(T>t)=1F(t)

  • Hazard rate, also called hazard function h(t)=f(t)S(t)=lim
    • In some other books, hazard rate is denoted as \lambda(t)

Hazard rate and cumulative hazard function

  • Connection between hazard rate h(t) and survival function S(t) h(t) = -\frac{\partial \log S(t)}{\partial t} \quad \Longleftrightarrow \quad S(t) = \exp\left\{ -\int_0^t h(x)dx \right\}

  • Cumulative hazard function \Lambda(t) = \int_0^t h(x) dx = -\log S(t)

  • Knowing any of S(t), h(t), \Lambda(t) allows one to derive the other two

  • Example: exponential distributed T f(t) = \lambda e^{- \lambda t} \quad \Longrightarrow \quad S(t) = e^{-\lambda t}, \quad h(t) = \lambda
    • Constant hazard rate: menoryless

Censored data

  • Censored data: survival times known only to exceed the reported value
    • E.g., lost to followup, experiment ended with some patients still alive
    • Usually denoted as “number+”
  • Observation z_i for censored data: z = (t_i, d_i), where t_i is the survival time, and d_i is the indicator d_i = \begin{cases} 1 & \text{if death observed}\\ 0 & \text{if death not observed} \end{cases}

Kaplan-Meier estimate

  • Among the censored data z_1, \ldots, z_n, we denote the ordered survival times as t_{(1)} < t_{(2)} < \ldots < t_{(n)}, assuming no ties.

  • The Kaplan-Meier estimate for survival probability S_{(j)} = P(X > t_{(j)}) is the life table estimate \hat{S}_{(j)} = \prod_{k \leq j} \left( \frac{n-k}{n-k+1} \right)^{d_{(k)}}

  • Life table curves are nonparametric: no relationship is assumed between the hazard rates h_i

A parametric approach

  • Death counts y_k are independent Binomials y_k \stackrel{ind}{\sim} \text{B}(n_k, h_k)

  • Logistic regression log\left( \frac{h_k}{1-h_k} \right) = \boldsymbol\alpha \mathbf{x}_k

    • E.g., cubic regression: x_k = (1, k, k^2, k^3)'

    • E.g., cubic-linear spline: x_k = (1, k, (k - k_0)_-^2, (k - k_0)_-^3)' where x_- = x \cdot \mathbf{1}_{x \leq 0}

Cox’s Proportional Hazards Model

Cox’s proportional hazards model

  • Proportional hazards model assumes h_i(t) = h_0(t) \cdot e^{\mathbf{x}_i' \boldsymbol\beta}, where h_0(t) is a baseline hazard, which we don’t need to specify

  • Denote \theta_i = e^{\mathbf{x}_i' \boldsymbol\beta}, then S_i(t) = S_0(t)^{\theta_i}, where S_0(t) is the baseline survival function

    • Larger value of \theta_i indicates more quickly declining (i.e., worse) survival curves
    • Positive value of the coefficient \beta_j indicates increase of the corresponding covariate x_j associating with worse survival curves

Proportional hazards model: key results

  • Let J be the number of observed deaths, occurring at times T_{(1)} < T_{(2)} < \ldots < T_{(J)} assuming no ties

  • Just before time T_{(j)} there is a risk set of individuals still under observation R_j = \{i, t_i \geq T_{(j)}\}

  • Key results of the proportional hazards model: given one person dies at time T_{(j)}, the probablity it is person i, among the set of people at risk, is P(i_j = i \mid R_j) = \frac{e^{\mathbf{x}_i' \boldsymbol\beta}} {\sum_{k \in R_j} e^{\mathbf{x}_j' \boldsymbol\beta}} = \frac{\theta_i}{\sum_{k \in R_j} \theta_j}

Parameter estimation: based on the partial likelihood

  • Estimaiton of \boldsymbol\beta is to maximize the partial likelihood L(\boldsymbol\beta) = \prod_{j=1}^J \frac{e^{\mathbf{x}_{i_j}' \boldsymbol\beta}} {\sum_{k \in R_j} e^{\mathbf{x}_j' \boldsymbol\beta}} where individual i_j dies at time T_{(j)}

  • Semi-parametric: we do not need to specify the baseline h_0(t), since it is not contained in the objective function

References

  • Efron, Bradley and Hastie, Trevor (2016), Computer Age Statistical Inference. Cambridge University Press