Processing math: 36%

Book Notes: Computer Age Statistical Inference -- Ch9 Survival Analysis

Survival Analysis
- Life Table and Kaplan-Meier Estimate
- Cox’s Proportional Hazards Model

For the pdf slides, click here

Survival Analysis

Life Table and Kaplan-Meier Estimate

Life table

An insurance company’s life table shows information of clients by their age. For each age , it contains
- : number of clients
- : number of death
- : hazard rate
- : survival probability estimate
An example life table

Age
34	120	0	0.000	1.000
35	71	1	0.014	0.986
36	125	0	0.000	0.986
…	…	…	…	…

Discrete survival analysis: notations

A client’s lifetime (time until event): random variable
- Also called failure time, survival time, or event time
Probability of dying at age
Probability of surviving past age
Hazard rate at age : conditional probability

Life table estimations

Hazard rate estimation: binomial proportions
- Typical frequentist inference: probabilistic results is estimated by the plug-in principle
Probability of surviving past age given survival past age :
Probability of survival estimation where is the starting age

Continuous survival analysis: notations

Time until event : a continuous positive random variable, with pdf and cdf
Survival function (i.e., reverse cdf)
Hazard rate, also called hazard function
- In some other books, hazard rate is denoted as

Hazard rate and cumulative hazard function

Connection between hazard rate and survival function
Cumulative hazard function
Knowing any of , , allows one to derive the other two
Example: exponential distributed
- Constant hazard rate: menoryless

Censored data

Censored data: survival times known only to exceed the reported value
- E.g., lost to followup, experiment ended with some patients still alive
- Usually denoted as “number+”
Observation for censored data: where is the survival time, and is the indicator

Kaplan-Meier estimate

Among the censored data , we denote the ordered survival times as assuming no ties.
The Kaplan-Meier estimate for survival probability is the life table estimate
Life table curves are nonparametric: no relationship is assumed between the hazard rates

A parametric approach

Death counts are independent Binomials
Logistic regression
- E.g., cubic regression:
- E.g., cubic-linear spline: where

Cox’s Proportional Hazards Model

Cox’s proportional hazards model

Proportional hazards model assumes where is a baseline hazard, which we don’t need to specify
Denote , then where is the baseline survival function
- Larger value of indicates more quickly declining (i.e., worse) survival curves
- Positive value of the coefficient indicates increase of the corresponding covariate associating with worse survival curves

Proportional hazards model: key results

Let be the number of observed deaths, occurring at times assuming no ties
Just before time there is a risk set of individuals still under observation
Key results of the proportional hazards model: given one person dies at time , the probablity it is person , among the set of people at risk, is

Parameter estimation: based on the partial likelihood

Estimaiton of is to maximize the partial likelihood where individual dies at time
Semi-parametric: we do not need to specify the baseline , since it is not contained in the objective function

References

Efron, Bradley and Hastie, Trevor (2016), Computer Age Statistical Inference. Cambridge University Press