# Paper Notes: Generalized R Squared

### $$R^2$$ for normal linear regression

• $$R^2$$, also called coefficient of determination or multiple correlation coefficient, is defined for normal linear regression, as the proportion of variance “explained” by the regression model $\begin{equation}\label{eq:R2} R^2 = \frac{\sum_i \left( y_i - \hat{y}_i \right)^2}{\sum_i \left( y_i - \bar{y} \right)^2} \end{equation}$

• Note that under the MLE, where $$\hat{\sigma}^2 = \sum_i \left( y_i - \hat{y}_i \right)^2 / n$$, the deviance (i.e., negative two times log likelihood) is \begin{align*} -2 l\left(\hat{\beta}\right) & = -2 \log L(\hat{\beta})\\ & = n \log(2\pi\hat{\sigma}^2) + \frac{\sum_i \left( y_i - \hat{y}_i \right)^2}{\hat{\sigma}^2}\\ & = n \left[ \log\left( \frac{\sum_i \left( y_i - \hat{y}_i \right)^2}{n} \right) + \log(2\pi) + 1 \right] \end{align*}

• I list this derivation here to make clear that the following generalized $$R^2$$ contains as a special case for normal linear regression

# Generalized $$R^2$$ by Cox and Snell

### Generalized $$R^2$$, proposed by Cox and Snell  (and also Magee  and Maddala )

• The genralized $$R^2$$ for more general models where

1. the concept of residual variance cannot be easily define, and
2. maximum likelihood is the criterion of fit, is $\begin{equation} \label{eq:generalized_R2_v1} R^2 = 1 - \exp\left\{ -\frac{2}{n}\left[l\left(\hat{\beta}\right) - l(\hat{0}) \right] \right\} = 1 - \left[L(0)/L\left(\hat{\beta}\right)\right]^{2/n} \end{equation}$
• Here, $$L\left(\hat{\beta}\right)$$ and $$L(0)$$ are the likelihood of the fitted and the null models, respectively.

• For normal linear regression, this generalized $$R^2$$ becomes the classical $$R^2$$

### Desirable properties of the generalized $$R^2$$, as in Eq

1. Consistent with classical $$R^2$$

2. Consistent with maximum likelihood as an estimation method

3. Asymptotically independent of the sample size $$n$$

4. $$1-R^2$$ has an interpretation as the propotion of unexplained “variation”

• For example, if we have three nested models, from smallest to largest, $$M_1, M_2$$, and $$M_3$$, then we have $(1 - R^2_{3, 1}) = (1 - R^2_{3, 2})(1 - R^2_{2, 1})$
• For more desirable properties (7 in total), please check out the Nagelkerke paper

# Generalized $$R^2$$ by Nagelkerke

### Generalized $$R^2$$, proposed by Nagelkerke 

• An undesirable property: for discrete models, the maximum $$R^2$$ is always less than 1 $\max(R^2) = 1 - L(0)^{2/n}$

• This is because the likelihood of discrete target variables are from pmf (rather than from pdf, as of continuous targets)
• A new definition of the generalized $$R^2$$ $\begin{equation}\label{eq:generalized_R2_v2} \bar{R}^2 = \frac{R^2}{\max(R^2)} = \frac{1 - \left[L(0)/L\left(\hat{\beta}\right)\right]^{2/n}}{1 - L(0)^{2/n}} \end{equation}$

• Majority of the desirable properties of , including the ones listed on the previous page, are still satisfied

• Nagelkerke’s general $$R^2$$ seems to be a popular version. For example, the biostat textbook by Steyerberg uses this version

## Generalized $$R^2$$ for binary data

### Generalized $$R^2$$ for binary data

• Denote the estimated binary probabilities as $$\hat{p}_i$$ for the fitted model, and $$\bar{p}$$ for the null model

• Cox and Snell $$R^2$$ $R^2 = 1 - \left[L(0)/L\left(\hat{\beta}\right)\right]^{2/n} = 1 - \left[ \prod_i \left(\frac{\bar{p}}{\hat{p}_i} \right)^{y_i} \left(\frac{1-\bar{p}}{1-\hat{p}_i} \right)^{1-y_i}\right]^{2/n}$

• Nagelkerke $$R^2$$ $\bar{R}^2 = \frac{1 - \left[L(0)/L\left(\hat{\beta}\right)\right]^{2/n}} {1 - L(0)^{2/n}} = \frac{1 - \left[\prod_i \left(\frac{\bar{p}}{\hat{p}_i} \right)^{y_i} \left(\frac{1-\bar{p}}{1-\hat{p}_i} \right)^{1-y_i}\right]^{2/n}} {1 - \left[\prod_i \bar{p}^{y_i} \left(1-\bar{p}\right)^{1-y_i}\right]^{2/n}}$