Book Notes: Computer Age Statistical Inference -- Ch9 Survival Analysis

For the pdf slides, click here

Survival Analysis

Life Table and Kaplan-Meier Estimate

Life table

  • An insurance company’s life table shows information of clients by their age. For each age i, it contains

    • ni: number of clients
    • yi: number of death
    • h^i=yi/ni: hazard rate
    • S^i: survival probability estimate
  • An example life table

Age n y h^ S^
34 120 0 0.000 1.000
35 71 1 0.014 0.986
36 125 0 0.000 0.986

Discrete survival analysis: notations

  • A client’s lifetime (time until event): random variable X
    • Also called failure time, survival time, or event time
  • Probability of dying at age i fi=P(X=i)

  • Probability of surviving past age i Si=ji+1fj=P(X>i)

  • Hazard rate at age i: conditional probability hi=fiSi1=P(X=iXi)

Life table estimations

  • Hazard rate estimation: binomial proportions h^i=yini
    • Typical frequentist inference: probabilistic results hi is estimated by the plug-in principle
  • Probability of surviving past age j given survival past age i: P(X>jX>i)=k=i+1jP(X>kXk)=k=i+1j(1hk)

  • Probability of survival estimation S^j=k=i0j(1h^k) where i0 is the starting age

Continuous survival analysis: notations

  • Time until event T: a continuous positive random variable, with pdf f(t) and cdf F(t)

  • Survival function (i.e., reverse cdf) S(t)=tf(x)dx=P(T>t)=1F(t)

  • Hazard rate, also called hazard function h(t)=f(t)S(t)=limΔt0P(t<Tt+ΔtT>t)Δt
    • In some other books, hazard rate is denoted as λ(t)

Hazard rate and cumulative hazard function

  • Connection between hazard rate h(t) and survival function S(t) h(t)=logS(t)tS(t)=exp{0th(x)dx}

  • Cumulative hazard function Λ(t)=0th(x)dx=logS(t)

  • Knowing any of S(t), h(t), Λ(t) allows one to derive the other two

  • Example: exponential distributed T f(t)=λeλtS(t)=eλt,h(t)=λ
    • Constant hazard rate: menoryless

Censored data

  • Censored data: survival times known only to exceed the reported value
    • E.g., lost to followup, experiment ended with some patients still alive
    • Usually denoted as “number+”
  • Observation zi for censored data: z=(ti,di), where ti is the survival time, and di is the indicator di={1if death observed0if death not observed

Kaplan-Meier estimate

  • Among the censored data z1,,zn, we denote the ordered survival times as t(1)<t(2)<<t(n), assuming no ties.

  • The Kaplan-Meier estimate for survival probability S(j)=P(X>t(j)) is the life table estimate S^(j)=kj(nknk+1)d(k)

  • Life table curves are nonparametric: no relationship is assumed between the hazard rates hi

A parametric approach

  • Death counts yk are independent Binomials ykindB(nk,hk)

  • Logistic regression log(hk1hk)=αxk

    • E.g., cubic regression: xk=(1,k,k2,k3)

    • E.g., cubic-linear spline: xk=(1,k,(kk0)2,(kk0)3) where x=x1x0

Cox’s Proportional Hazards Model

Cox’s proportional hazards model

  • Proportional hazards model assumes hi(t)=h0(t)exiβ, where h0(t) is a baseline hazard, which we don’t need to specify

  • Denote θi=exiβ, then Si(t)=S0(t)θi, where S0(t) is the baseline survival function

    • Larger value of θi indicates more quickly declining (i.e., worse) survival curves
    • Positive value of the coefficient βj indicates increase of the corresponding covariate xj associating with worse survival curves

Proportional hazards model: key results

  • Let J be the number of observed deaths, occurring at times T(1)<T(2)<<T(J) assuming no ties

  • Just before time T(j) there is a risk set of individuals still under observation Rj={i,tiT(j)}

  • Key results of the proportional hazards model: given one person dies at time T(j), the probablity it is person i, among the set of people at risk, is P(ij=iRj)=exiβkRjexjβ=θikRjθj

Parameter estimation: based on the partial likelihood

  • Estimaiton of β is to maximize the partial likelihood L(β)=j=1JexijβkRjexjβ where individual ij dies at time T(j)

  • Semi-parametric: we do not need to specify the baseline h0(t), since it is not contained in the objective function

References

  • Efron, Bradley and Hastie, Trevor (2016), Computer Age Statistical Inference. Cambridge University Press