Paper Notes: Generalized R Squared

For the pdf slides, click here

R2 for normal linear regression

  • R2, also called coefficient of determination or multiple correlation coefficient, is defined for normal linear regression, as the proportion of variance “explained” by the regression model R2=i(yiy^i)2i(yiy¯)2

  • Note that under the MLE, where σ^2=i(yiy^i)2/n, the deviance (i.e., negative two times log likelihood) is 2l(β^)=2logL(β^)=nlog(2πσ^2)+i(yiy^i)2σ^2=n[log(i(yiy^i)2n)+log(2π)+1]

    • I list this derivation here to make clear that the following generalized R2 contains as a special case for normal linear regression

Generalized R2 by Cox and Snell

Generalized R2, proposed by Cox and Snell [1989] (and also Magee [1990] and Maddala [1983])

  • The genralized R2 for more general models where

    1. the concept of residual variance cannot be easily define, and
    2. maximum likelihood is the criterion of fit, is R2=1exp{2n[l(β^)l(0^)]}=1[L(0)/L(β^)]2/n
  • Here, L(β^) and L(0) are the likelihood of the fitted and the null models, respectively.

  • For normal linear regression, this generalized R2 becomes the classical R2

Desirable properties of the generalized R2, as in Eq

  1. Consistent with classical R2

  2. Consistent with maximum likelihood as an estimation method

  3. Asymptotically independent of the sample size n

  4. 1R2 has an interpretation as the propotion of unexplained “variation”

    • For example, if we have three nested models, from smallest to largest, M1,M2, and M3, then we have (1R3,12)=(1R3,22)(1R2,12)
  • For more desirable properties (7 in total), please check out the Nagelkerke[1991] paper

Generalized R2 by Nagelkerke

Generalized R2, proposed by Nagelkerke [1991]

  • An undesirable property: for discrete models, the maximum R2 is always less than 1 max(R2)=1L(0)2/n

    • This is because the likelihood of discrete target variables are from pmf (rather than from pdf, as of continuous targets)
  • A new definition of the generalized R2 R¯2=R2max(R2)=1[L(0)/L(β^)]2/n1L(0)2/n

    • Majority of the desirable properties of , including the ones listed on the previous page, are still satisfied

    • Nagelkerke’s general R2 seems to be a popular version. For example, the biostat textbook by Steyerberg uses this version

Generalized R2 for binary data

Generalized R2 for binary data

  • Denote the estimated binary probabilities as p^i for the fitted model, and p¯ for the null model

  • Cox and Snell R2 R2=1[L(0)/L(β^)]2/n=1[i(p¯p^i)yi(1p¯1p^i)1yi]2/n

  • Nagelkerke R2 R¯2=1[L(0)/L(β^)]2/n1L(0)2/n=1[i(p¯p^i)yi(1p¯1p^i)1yi]2/n1[ip¯yi(1p¯)1yi]2/n

References