Book Notes: Intro to Time Series and Forecasting -- Ch5 ARMA Models Estimation and Forecasting

For the pdf slides, click here

Parameter estimation for ARMA\((p, q)\)

  • When the orders \(p, q\) are known, estimate the parameters \[ \boldsymbol\phi = (\phi_1, \ldots, \phi_p), \quad \boldsymbol\theta = (\theta_1, \ldots, \theta_q), \quad \sigma^2 \]
    • There are \(p+q+1\) parameters in total
  • Preliminary estimations
    • Yule-Walker and Burg’s algorithm: good for AR\((p)\)
    • Innovation algorithm: good for MA\((q)\)
    • Hannan-Rissanen algorithm: good for ARMA\((p, q)\)
  • More efficient estimation: MLE

  • When the orders \(p, q\) are unknown, use model selection methods to select orders
    • Minimize one-step MSE: FPE
    • Penalized likelihood methods: AIC, AICC, BIC

Yule-Walker Estimation

Yule-Walker equations

  • \(\{X_t\}\) is a casual AR\((p)\) process \[ X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + Z_t \]

  • Multiplying each side by \(X_t, X_{t-1}, \ldots, X_{t-p}\), respectively, and taking expectation, we got the Yule-Walker equations \[ \sigma^2 = \gamma(0) - \phi_1 \gamma(1) - \cdots \phi_p \gamma(p) \] \[ \underbrace{\left[ \begin{array}{cccc} \gamma(0) & \gamma(1) & \cdots &\gamma(p-1) \\ \gamma(1) & \gamma(0) & \cdots &\gamma(p-2) \\ \vdots & \vdots & \vdots & \vdots \\ \gamma(p-1) & \gamma(p-2) & \cdots &\gamma(0) \\ \end{array} \right]}_{\boldsymbol\Gamma_p} \underbrace{ \left[ \begin{array}{c} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \\ \end{array} \right]}_{\boldsymbol\phi} = \underbrace{ \left[ \begin{array}{c} \gamma(1) \\ \gamma(2) \\ \vdots \\ \gamma(p) \\ \end{array} \right]}_{\boldsymbol\gamma_p} \]

  • Vector representation \[ \boldsymbol\Gamma_p \boldsymbol\phi = \boldsymbol\gamma_p, \quad \sigma^2 = \gamma(0) - \boldsymbol\phi' \boldsymbol\gamma_p \]

Yule-Walker estimator and its properties

  • Yule-Walker estimators \(\hat{\boldsymbol\phi} = (\hat{\phi}_1, \cdots, \hat{\phi}_p)\) are obtained by solving the hatted version of the Yule-Walker equations \[ \hat{\boldsymbol\phi} = \hat{\boldsymbol\Gamma}_p^{-1}\hat{\boldsymbol\gamma}_p, \quad \hat{\sigma}^2 = \hat{\gamma}(0) - \hat{\boldsymbol\phi}' \hat{\boldsymbol\gamma}_p \]

  • The fitted model is causal and \(\hat{\sigma}^2 \geq 0\) \[ X_t = \hat{\phi}_1 X_{t-1} + \cdots + \hat{\phi}_p X_{t-p} + Z_t, \quad Z_t \sim \textrm{WN}(0, \hat{\sigma}^2) \]

  • Asymptotic normality \[ \hat{\boldsymbol\phi} \stackrel{\cdot}{\sim} \textrm{N}\left( \boldsymbol\phi, \frac{\sigma^2 \boldsymbol\Gamma_p^{-1}}{n}\right) \]

Yule-Walker estimator is a moment estimator: because it is obtained by equating theoretical and sample moments

  • Usually moment estimators have much higher variance than MLE

  • But Yule-Walker estimators of AR\((p)\) process have the same asymptotic distribution as the MLE

  • Moment estimators can fail for MA\((q)\) and general ARMA

    • For example, MA\((1)\): \(X_t = Z_t + \theta Z_{t+1}\) with \(\{Z_t\}\sim \textrm{WN} (0, \sigma^2)\). \[ \gamma(0) = (1+\theta^2)\sigma^2, \quad \gamma(1) = \theta \sigma^2 \quad \Longrightarrow \quad \rho(1) = \frac{\theta}{1+\theta^2} \] Moment estimator of \(\theta\) is obtained by solving \[ \hat{\rho}(1) = \frac{\hat{\theta}}{1+\hat{\theta}^2} \quad \Longrightarrow \quad \hat{\theta} = \frac{1 \pm \sqrt{1 - 4 \hat{\rho}(1)^2}}{2 \hat{\rho}(1)} \] This can yield complex \(\hat{\theta}\) if \(|\hat{\rho}(1)| > 1/2\), which can happen if \(\rho(1) = 1/2\), i.e., \(\theta = 1\)

Innovations algorithm: estimate MA coefficients

  • Fitted innovations MA\((m)\) model \[ X_t = Z_t + \hat{\theta}_{m1} Z_{t-1} + \cdots + \cdots + \hat{\theta}_{mm}Z_{t-m}, \quad \{Z_t\} \sim \textrm{WN}(0, \hat{v}_m) \] where \(\hat{\boldsymbol\theta}_m\) and \(\hat{v}_m\) are from the innovations algorithm with ACVF replaced by the sample ACVF

  • For a MA\((q)\) process, the innovations algorithm estimator \(\hat{\boldsymbol\theta}_q = (\hat{\theta}_{q1}, \ldots, \hat{\theta}_{qq})'\) is NOT consistent for \((\theta_1, \ldots, \theta_q)'\)

  • Choice of \(m\): increase \(m\) until the vector \((\hat{\theta}_{m1}, \ldots, \hat{\theta}_{mq})'\) stabilizes

Maximum Likelihood Estimation

Likelihood function of a Gaussian time series

  • Suppose \(\{X_t\}\) is a Gaussian time series with mean zero

  • Assume that covariance matrix \(\boldsymbol\Gamma_n = E(\mathbf{X}_n \mathbf{X}_n')\) is nonsingular

  • One-step predictors using innovations algorithm: \(\hat{X}_1 = 0\) and \[ \hat{X}_{j+1} = P_{j} X_{j+1} % = \phi_{j1}X_j + \ldots + \phi_{jj} X_1 \] with MSE \(v_j = E\left(X_{j+1} - \hat{X}_{j+1}\right)^2\)

    • Example: AR\((1)\) \[ \hat{X}_j = \begin{cases} 0, & j = 1 \\ \phi \hat{X}_{j-1} & j \geq 2 \end{cases}, \quad v_j = \begin{cases} \frac{\sigma^2}{1-\phi^2}, & j = 0 \\ \sigma^2 & j \geq 1 \end{cases} \]
  • Likelihood function \[\begin{align*} L & \propto \left| \boldsymbol\Gamma_n \right|^{-1/2} \exp \left( -\frac{1}{2} \mathbf{X}_n' \boldsymbol\Gamma_n^{-1} \mathbf{X}_n\right)\\ & = \left( v_0 v_1 \cdots v_{n-1} \right)^{-1/2} \exp \left[ -\frac{1}{2} \sum_{j=1}^n \frac{(X_j - \hat{X}_j)^2}{v_{j-1}}\right] \end{align*}\]

Maximum likelihood estimation of ARMA\((p, q)\)

  • Innovations MSE \(v_j = \sigma^2 r_j\), where \(r_j\) depends on \(\boldsymbol\phi\) and \(\boldsymbol\theta\)

  • Maximizing the likelihood is equivalent to minimizing \[ -2\log L(\boldsymbol\phi, \boldsymbol\theta, \sigma^2) = n\log(\sigma^2) + \sum_{j=1}^n \log(r_{j-1}) + \frac{S(\boldsymbol\phi, \boldsymbol\theta)}{\sigma^2}, \] where \[ S(\boldsymbol\phi, \boldsymbol\theta) = \sum_{j=1}^n \frac{(X_j - \hat{X}_j)^2}{r_{j-1}} \]

  • MLE \(\hat{\sigma}^2\) can be expressed with MLE \(\hat{\boldsymbol\phi}, \hat{\boldsymbol\theta}\) \[ \hat{\sigma}^2 = \frac{S\left(\hat{\boldsymbol\phi}, \hat{\boldsymbol\theta}\right)}{n} \]

  • MLE \(\hat{\boldsymbol\phi}, \hat{\boldsymbol\theta}\) are obtained by minimizing \[ \log\left[ \frac{S(\boldsymbol\phi, \boldsymbol\theta)}{n} \right]+ \frac{1}{n} \sum_{j=1}^n \log(r_{j-1}) \] Not depend on \(\sigma^2\)!

Asymptotic normality of MLE

  • When \(n\) is large, for a causal and invertible ARMA\((p, q)\) process, \[ \left[ \begin{array}{c} \hat{\boldsymbol\phi}\\ \hat{\boldsymbol\theta} \end{array} \right] \stackrel{\cdot}{\sim}\textrm{N}_{p+1} \left( \left[ \begin{array}{c} \hat{\boldsymbol\phi}\\ \hat{\boldsymbol\theta} \end{array} \right], \frac{\mathbf{V}}{n} \right) \]

  • For an AR\((p)\) process, MLE has the same asymptotic distribution as the Yule-Walker estimator \[ \mathbf{V} = \sigma^2 \boldsymbol\Gamma_p^{-1} \quad \Longrightarrow \quad \hat{\boldsymbol\phi} \stackrel{\cdot}{\sim} \textrm{N}\left( \boldsymbol\phi, \frac{\sigma^2 \boldsymbol\Gamma_p^{-1}}{n}\right) \]

Examples of \(\mathbf{V}\)

  • AR\((1)\) \[ \mathbf{V} = 1 - \phi_1^2 \]

  • AR\((2)\) \[ \mathbf{V} = \left[ \begin{array}{cc} 1 - \phi_2^2 & -\phi_1(1 + \phi_2)\\ -\phi_1(1 + \phi_2) & 1 - \phi_2^2\\ \end{array} \right] \]

  • MA\((1)\) \[ \mathbf{V} = 1 - \theta_1^2 \]

  • MA\((2)\) \[ \mathbf{V} = \left[ \begin{array}{cc} 1 - \theta_2^2 & \theta_1(1 - \theta_2)\\ \theta_1(1 - \theta_2)& 1 - \theta_2^2\\ \end{array} \right] \]

  • ARMA\((1,1)\) \[ \mathbf{V} = \frac{1 + \phi \theta}{(\phi + \theta)^2} \left[ \begin{array}{cc} (1 - \phi^2)(1 + \phi \theta) & -(1 - \theta^2)(1 - \phi^2) \\ -(1 - \theta^2)(1 - \phi^2) & (1 - \phi^2)(1 + \phi \theta) \\ \end{array} \right] \]

Order Selection

Order selection

  • Why? Harm of using too large \(p, q\) to fit models:

    • Large errors arising from parameter estimation of the model
    • Large MSEs of forecasts
  • FPE: only for AR\((p)\) processes \[ \text{FPE} = \hat{\sigma}^2 \frac{n+p}{n-p} \]

  • AIC: for ARMA\((p, q)\); approximate Kullback-Leibler discrepancy of the fitted model and the true model, a penalized likelihood method \[ \text{AIC} = -2\log (\hat{L}) + 2(p + q + 1) \]

  • AICC: for ARMA\((p, q)\); a bias-corrected version of AIC, a penalized likelihood method \[ \text{AICC} = -2\log (\hat{L}) + 2(p + q + 1) \cdot \frac{n}{n-p-q-2} \]

Diagnostic Checking

Residuals and rescaled residuals

  • Residuals of an ARMA\((p, q)\) process \[ \hat{W}_t = \frac{X_t - \hat{X}_t\left(\hat{\boldsymbol\phi}, \hat{\boldsymbol\theta}\right)} {\sqrt{r_{t-1}\left(\hat{\boldsymbol\phi}, \hat{\boldsymbol\theta}\right)}}, \quad t = 1, \ldots, n \]
    • Residuals \(\{\hat{W}_t\}\) should be similar to white noises \(\{Z_t\}\)
  • Rescaled residuals \[ \hat{R}_t = \frac{\hat{W}_t}{\hat{\sigma}}, \quad \hat{\sigma} = \sqrt{\frac{\sum_{t=1}^n \hat{W}_t^2}{n}} \]
    • Residuals residuals should be approximately \(\textrm{WN}(0, 1)\)

Residual diagnostics

  1. Plot \(\{\hat{R}_t\}\) and look for patterns

  2. Compute the sample ACF of \(\{\hat{R}_t\}\)
    • It should be close to the \(\textrm{WN}(0, 1)\) sample ACF
  3. Apply Chapter 1 tests for IID noises

References

  • Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer