For the pdf slides, click here
Survival Analysis
Life Table and Kaplan-Meier Estimate
Life table
An insurance company’s life table shows information of clients by their age. For each age , it contains
- : number of clients
- : number of death
- : hazard rate
- : survival probability estimate
An example life table
Age | ||||
---|---|---|---|---|
34 | 120 | 0 | 0.000 | 1.000 |
35 | 71 | 1 | 0.014 | 0.986 |
36 | 125 | 0 | 0.000 | 0.986 |
… | … | … | … | … |
Discrete survival analysis: notations
- A client’s lifetime (time until event): random variable
- Also called failure time, survival time, or event time
Probability of dying at age
Probability of surviving past age
Hazard rate at age : conditional probability
Life table estimations
- Hazard rate estimation: binomial proportions
- Typical frequentist inference: probabilistic results is estimated by the plug-in principle
Probability of surviving past age given survival past age :
Probability of survival estimation where is the starting age
Continuous survival analysis: notations
Time until event : a continuous positive random variable, with pdf and cdf
Survival function (i.e., reverse cdf)
- Hazard rate, also called hazard function
- In some other books, hazard rate is denoted as
Hazard rate and cumulative hazard function
Connection between hazard rate and survival function
Cumulative hazard function
Knowing any of , , allows one to derive the other two
- Example: exponential distributed
- Constant hazard rate: menoryless
Censored data
- Censored data: survival times known only to exceed the reported value
- E.g., lost to followup, experiment ended with some patients still alive
- Usually denoted as “number+”
- Observation for censored data: where is the survival time, and is the indicator
Kaplan-Meier estimate
Among the censored data , we denote the ordered survival times as assuming no ties.
The Kaplan-Meier estimate for survival probability is the life table estimate
Life table curves are nonparametric: no relationship is assumed between the hazard rates
A parametric approach
Death counts are independent Binomials
Logistic regression
E.g., cubic regression:
E.g., cubic-linear spline: where
Cox’s Proportional Hazards Model
Cox’s proportional hazards model
Proportional hazards model assumes where is a baseline hazard, which we don’t need to specify
Denote , then where is the baseline survival function
- Larger value of indicates more quickly declining (i.e., worse) survival curves
- Positive value of the coefficient indicates increase of the corresponding covariate associating with worse survival curves
Proportional hazards model: key results
Let be the number of observed deaths, occurring at times assuming no ties
Just before time there is a risk set of individuals still under observation
Key results of the proportional hazards model: given one person dies at time , the probablity it is person , among the set of people at risk, is
Parameter estimation: based on the partial likelihood
Estimaiton of is to maximize the partial likelihood where individual dies at time
Semi-parametric: we do not need to specify the baseline , since it is not contained in the objective function
References
- Efron, Bradley and Hastie, Trevor (2016), Computer Age Statistical Inference. Cambridge University Press