Book Notes: Intro to Time Series and Forecasting -- Ch6 ARIMA Models

For the pdf slides, click here

When data is not stationary

  • Implication of not stationary: sample ACF or sample PACF do not rapidly decrease to zero as lag increases

  • What shall we do?
    • Differencing, then fit an ARMA \(\rightarrow\) ARIMA
    • Transformation, then fit an ARMA
    • Seasonal model \(\rightarrow\) SARIMA

A non-stationary exmaple: Dow Jones utilities index data

library(itsmr); ## Load the ITSM-R package
par(mfrow = c(1, 3));
plot.ts(dowj, main = 'Raw data'); 
acf(dowj); pacf(dowj);

After differencing

par(mfrow = c(1, 3));
dowj_diff = dowj[-length(dowj)] - dowj[-1];
plot.ts(dowj_diff, main = 'Data after differencing'); 
acf(dowj_diff); pacf(dowj_diff);

ARIMA Models

ARIMA model: definition

  • Autoregressive integrated moving-average models (ARIMA): Let \(d \in \mathbb{N}\), then series \(\{X_t\}\) is an ARIMA\((p,d,q)\) process if \[ Y_t = (1 - B)^d X_t \] is a causal ARMA\((p,q)\) process.

  • Difference equation (DE) for an ARIMA\((p,d,q)\) process \[ \phi^*(B) X_t = \phi(B) (1-B)^d X_t = \theta(B) Z_t, \quad \{Z_t\} \sim \textrm{WN}(0, \sigma^2) \]
    • \(\phi(z)\): polynomial of degree \(p\), and \(\phi(z) \neq 0\) for \(|z| \leq 1\)
    • \(\theta(z)\): polynomial of degree \(q\)
    • \(\phi^*(z) = \phi(z) (1-z)^d\): has a zero of order \(d\) at \(z=1\)
  • An ARIMA process with \(d > 0\) is NOT stationary!

ARIMA mean is not dertermined by the DE

  • \(\{X_t\}\) is an ARIMA\((p,d,q)\) process. We can add an arbitrary polynomial trend of degree \(d-1\) \[ W_t = X_t + A_0 + A_1 t + \cdots + A_{d-1} t^{d-1} \] with \(A_0, \ldots, A_{d-1}\) being any random variables, and \(\{W_t\}\) still satisfies the same ARIMA\((p,d,q)\) difference equation

  • In other words, the ARIMA DE determines the second-order properties of \(\{(1-B)^d X_t\}\) but not those of \(\{X_t\}\)

    • For parameter estimation: \(\boldsymbol\phi\), \(\boldsymbol\theta\), and \(\sigma^2\) are estimated based on \(\{(1-B)^d X_t\}\) rather than \(\{X_t\}\)
    • For forecast, we need additional assumptions

Fit data using ARIMA processes

  • Whether to fit a finite time series using
    • non-stationary models (such as ARIMA), or
    • directly using stationary models (such as ARMA)?
  • If the fitted stationary ARMA model’s \(\phi(\cdot)\) have zeros very close to unit circles, then fitting an ARIMA model is better
    • Parameter estimation is stable
    • The differenced series may only need a low-order ARMA
  • Limitation of ARIMA: only permits data to be nonstationary in a very special way
    • Non-stationary: can have zeros anywhere on the unit circle \(|z|=1\)
    • ARIMA model: only has a zero of multiplicity \(d\) at the point \(z=1\)

Transformation and Identification Techniques

Natural log transformation

  • When data variance increases with mean, it’s common to apply log transformation before fitting the data using ARIMA or ARMA.

  • When does log transfomation work well? Suppose that \[ E(X_t) = \mu_t, \quad Var(X_t) = \sigma^2 \mu_t^2 \] Then by first-order Taylor expansion of \(\log(X_t)\) at \(\mu_t\): \[ \log(X_t) \approx \log(\mu_t) + \frac{X_t - \mu_t}{\mu_t} \ \Longrightarrow \ Var\left[\log(X_t)\right] \approx \frac{Var(X_t)}{\mu_t^2} = \sigma^2 \] The data after log transformation \(\log(X_t)\) has a constant variance

  • Note: log transformation can only be applied to positive data
  • Note: If \(Y_t = \log(X_t)\), then because expectation and logarithm are not interchangeable, \[ \hat{X}_t \neq \exp(\hat{Y}_t) \]

Generalize the log transformation: Box-Cox transformation

  • Box-Cox transformation \[ f_{\lambda}(x) = \begin{cases} \frac{x^{\lambda} - 1}{\lambda}, & x \geq 0, \lambda > 0 \\ \log(x), & x > 0, \lambda = 0 \\ \end{cases} \]

    • Usual range: \(0 \leq \lambda \leq 1.5\)
    • Common values: \(\lambda = 0, 0.5\)
  • Note: \(\lim_{\lambda \rightarrow 0} f_{\lambda}(x) = \log(x)\)

  • Box-Cox transformation can only be applied to non-negative data

Unit Root Test

Unit root test for AR\((1)\) process

  • \(\{X_t\}\) is an AR\((1)\) process: \(X_t - \mu = \phi_1(X_{t-1} - \mu) + Z_t\)

  • Equivalent DE: \[ \nabla X_t = X_t - X_{t-1} = \phi_0^* + \phi_1^* X_{t-1} + Z_t \]
    • where \(\phi_0^* = \mu(1 - \phi_1)\) and \(\phi_1^* = \phi_1 - 1\)
    • Regressing \(\nabla X_t\) onto \(1\) and \(X_{t-1}\), we get the OLS estimator \(\hat{\phi}_1^*\) and its standard error \(SE(\hat{\phi}_1^*)\)
  • Augmented Dickey-Fuller test for AR\((1)\)
    • Hypotheses: \(H_0: \phi_1 = 1 \ \longleftrightarrow \ H_1: \phi_1 < 1\)
    • Equivalent hypotheses: \(H_0: \phi_1^* = 0 \ \longleftrightarrow \ H_1: \phi_1^* < 0\)
    • Test statistic: limit distribution under \(H_0\) is not normal or t \[ \hat{\tau} = \frac{\hat{\phi}_1^*}{SE(\hat{\phi}_1^*)} \]
    • Rejection region: reject \(H_0\) if \[ \begin{cases} \hat{\tau} < -2.57 & (90\%)\\ \hat{\tau} < -2.86 & (95\%)\\ \hat{\tau} < -3.43 & (99\%) \end{cases} \]

Unit root test for AR\((p)\) process

  • AR\((p)\) process: \(X_t - \mu = \phi_1(X_{t-1} - \mu) + \cdots + \phi_p(X_{t-p} - \mu) + Z_t\)

  • Equivalent DE: \[ \nabla X_t = \phi_0^* + \phi_1^* X_{t-1} + \phi_2^* \nabla X_{t-1} + \cdots + \phi_p^* \nabla X_{t-p+1} + Z_t \]

    • where \(\phi_0^* = \mu(1 - \sum_{i=1}^p \phi_i)\), \(\phi_1^* = \sum_{i=1}^p \phi_i - 1\), and \(\phi_j^* = -\sum_{i=j}^p \phi_i\) for \(j \geq 2\)
    • Regressing \(\nabla X_t\) onto \(1, X_{t-1}, \nabla X_{t-1}, \cdots, \nabla X_{t-p+1}\), we get the OLS estimator \(\hat{\phi}_1^*\) and its standard error \(SE(\hat{\phi}_1^*)\)
  • Augmented Dickey-Fuller test for AR\((p)\)
    • Hypotheses: \(H_0: \phi_1^* = 0 \ \longleftrightarrow \ H_1: \phi_1^* < 0\)
    • Test statistic: \[ \hat{\tau} = \frac{\hat{\phi}_1^*}{SE(\hat{\phi}_1^*)} \]
    • Rejection region: same as augmented Dickey-Fuller test for AR\((1)\)

Implement augmented Dickey-Fuller test in R

library(tseries);
## Note: the lag k here is the AR order p
adf.test(dowj, k = 2);
## 
##  Augmented Dickey-Fuller Test
## 
## data:  dowj
## Dickey-Fuller = -1.3788, Lag order = 2, p-value = 0.8295
## alternative hypothesis: stationary

Forecast ARIMA models

Forecast an ARIMA\((p, 1, q)\) process

  • \(\{X_t\}\) is ARIMA\((p, 1, q)\), and \(\{Y_t = \nabla X_t\}\) is a causal ARMA\((p, q)\) \[ X_t = X_0 + \sum_{j=1}^t Y_j, \quad t = 1, 2, \ldots \]

  • Best linear predictor of \(X_{n+1}\) \[ P_n X_{n+1} = P_n(X_0 + Y_1 + \cdots + Y_{n+1}) = P_n(X_n + Y_{n+1}) = X_n + P_n(Y_{n+1}), \]
    • \(P_n\) means based on \(\{1, X_0, X_1, \ldots, X_n\}\), or equivalently, \(\{1, X_0, Y_1, \ldots, Y_n\}\)
    • To find \(P_n(Y_{n+1})\), we need to know \(E(X_0^2)\) and \(E(X_0 Y_j)\), for \(j = 1, \ldots, n+1\).
  • A sufficient assumption for \(P_n(Y_{n+1})\) to be the best linear predictor in term of \(\{Y_1, \ldots, Y_n\}\): \(X_0\) is uncorrelated with \(Y_1, Y_2, \ldots\)

Forecast an ARIMA\((p, d, q)\) process

  • The observed ARIMA\((p, d, q)\) process \(\{X_t\}\) satisfies \[ Y_t = (1-B)^d X_t, \quad t = 1, 2, \ldots, \quad \{Y_t\} \sim \text{causal ARMA}(p, q) \]

  • Assumption: the random vector \((X_{1-d}, \ldots, X_0)\) is uncorrelated with \(Y_t\) for all \(t > 0\)

  • One-step predictors \(\hat{Y}_{n+1} = P_n Y_{n+1}\) and \(\hat{X}_{n+1} = P_n X_{n+1}\): \[ X_{n+1} - \hat{X}_{n+1} = Y_{n+1} - \hat{Y}_{n+1} \]

  • Recall: the \(h\)-step predictor of ARMA\((p,q)\) for \(n > \max(p, q)\): \[ P_n Y_{n+h} = \sum_{i=1}^p \phi_i P_n Y_{n+h-i} + \sum_{j=h}^q \theta_{n+h-1, j}(Y_{n+h-j} - \hat{Y}_{n+h-j}) \]

  • \(h\)-step predictor of ARIMA\((p, d, q)\) for \(n > \max(p, q)\): \[ P_n X_{n+h} = \sum_{i=1}^{p+d} \phi_i^* P_n X_{n+h-i} + \sum_{j=h}^q \theta_{n+h-1, j}(X_{n+h-j} - \hat{X}_{n+h-j}) \] where \(\phi^*(z) = (1-z)^d \phi(z) = 1 - \phi_1^*z - \cdots -\phi_{p+d}^* z^{p+d}\)

Seasonal ARIMA Models

Seasonal ARIMA (SARIMA) Model: definition

  • Suppose \(d, D\) are non-negative integers. \(\{X_t\}\) is a seasonal ARIMA\((p, d, q)\) \(\times\) \((P, D, Q)_s\) process with period \(s\) if the differenced series \[ Y_t = (1-B)^d (1-B^s)^D X_t \] is a causal ARMA process defined by \[ \phi(B) \Phi(B^s) Y_t = \theta(B) \Theta(B^s) Z_t, \quad \{Z_t\} \sim \textrm{WN}(0, \sigma^2) \] where \[\begin{align*} \phi(z) & = 1 - \phi_1 z - \cdots - \phi_p z^p, \quad \Phi(z) = 1 - \Phi_1 z - \cdots - \Phi_P z^P \\ \theta(z) & = 1 + \theta z + \cdots + \theta_q z^q, \quad \Theta(z) = 1 + \Theta z + \cdots + \Theta_Q z^Q \end{align*}\]

  • \(\{Y_t\}\) is causal if and only if neither \(\phi(z)\) or \(\Phi(z)\) has zeros inside the unit circle

  • Usually, \(s=12\) for monthly data

Special case: seasonal ARMA (SARMA)

  • Between-year model: for monthly data, each one of the 12 time series is generated by the same ARMA\((P, Q)\) model \[ \Phi(B^{12}) Y_t = \Theta(B^{12}) U_t, \quad \{U_{j+12t}, t \in \mathbb{Z}\} \sim \textrm{WN}(0, \sigma^2_U) \]

  • SARMA\((P, Q)\) with period \(s\): in the above between-year model, the period 12 can be changed to any positive integer \(s\)
    • If \(\{U_{t}, t \in \mathbb{Z}\}\sim \textrm{WN}(0, \sigma^2_U)\), then the ACVF \(\gamma(h) = 0\) unless \(h\) divides \(s\) evenly. But this may not be ideal for real life application! E.g., this Feb is correlated with last Feb, but not this Jan.
  • General SARMA\((p, q)\) \(\times\) \((P, Q)\) with period \(s\): incorporate dependency between the 12 series by letting \(\{U_t\}\) be ARMA: \[ \Phi(B^{s}) Y_t = \Theta(B^{s}) U_t, \quad \phi(B) U_t = \theta(B)Z_t, \quad \{Z_t\} \sim \textrm{WN}(0, \sigma^2) \]
    • Equivalent DE for the general SARMA: \[ \phi(B) \Phi(B^s) Y_t = \theta(B) \Theta(B^s) Z_t \]

Fit a SARIMA Model

  • Period \(s\) is known
  1. Find \(d\) and \(D\) to make the difference series \(\{Y_t\}\) to look stationary

  2. Examine the sample ACF and sample PACF of \(\{Y_t\}\) at lags being multiples of \(s\), to find orders \(P, Q\)

  3. Use \(\hat{\rho}(1), \ldots, \hat{\rho}(s-1)\) to find orders \(p, q\)

  4. Use AICC to decide among competing order choices

  5. Given orders \((p, d, q, P, D, Q)\), estimate MLE of parameters \((\phi, \theta, \Phi, \Theta, \sigma^2)\)

Regression with ARMA Errors

Regression with ARMA errors: OLS estimation

  • Linear model with ARMA errors \(\mathbf{W} = (W_1, \ldots, W_n)'\): \[ Y_t = \mathbf{x}_t' \boldsymbol\beta + W_t, \quad t = 1, \ldots, n, \quad \{W_t\} \sim \text{causal ARMA}(p,q) \]
    • Note: each row is indexed by a different time \(t\)!
    • Error covariance \(\boldsymbol\Gamma_n = E(\mathbf{W} \mathbf{W}')\)
  • Ordinary least squares (OLS) estimator \[ \hat{\boldsymbol\beta}_{\text{OLS}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \]
    • Estimated by minimizing \((\mathbf{Y} - \mathbf{X}\boldsymbol\beta)' (\mathbf{Y} - \mathbf{X}\boldsymbol\beta)\)
    • OLS is unbiased, even when errors are dependent!

Regression with ARMA errors: GLS estimation

  • Generalized least squares (GLS) estimator \[ \hat{\boldsymbol\beta}_{\text{GLS}} = (\mathbf{X}'\boldsymbol\Gamma_n^{-1}\mathbf{X})^{-1}\mathbf{X}'\boldsymbol\Gamma_n^{-1}\mathbf{Y} \]
    • Estimated by minimizing the weighted sum of squares \[ (\mathbf{Y} - \mathbf{X}\boldsymbol\beta)' \boldsymbol\Gamma_n^{-1} (\mathbf{Y} - \mathbf{X}\boldsymbol\beta) \]

    • Covariance \[ \textrm{Cov}\left( \hat{\boldsymbol\beta}_{\text{GLS}} \right) = (\mathbf{X}'\boldsymbol\Gamma_n^{-1}\mathbf{X})^{-1} \]

    • GLS is the best linear unbiased estimator, i.e., for any vector \(\mathbf{c}\) and any unbiased estimator \(\hat{\boldsymbol\beta}\), we have \[ \textrm{Var}(\mathbf{c}' \hat{\boldsymbol\beta}_{\text{GLS}}) \leq \textrm{Var}(\mathbf{c}' \hat{\boldsymbol\beta}) \]

When \(\{W_t\}\) is an AR\((p)\) process

  • We can apply \(\phi(B)\) to each side of the regression equation and get uncorrelated, zero-mean, constant-variance errors \[ \phi(B) \mathbf{Y} = \phi(B) \mathbf{X} \boldsymbol\beta + \phi(B) \mathbf{W} = \phi(B) \mathbf{X} \boldsymbol\beta + \mathbf{Z} \]

  • Under the transformed target variable \[ Y_t^* = \phi(B) Y_t, \quad t = p+1, \ldots, n \] and the transformed design matrix \[ X_{t, j}^* = \phi(B) X_{t, j}, \quad t = p+1, \ldots, n, \quad j = 1, \ldots, k \] the OLS estimator is the best linear unbiased estimator

  • Note: after the transformation, the regression sample size reduces to \(n-p\)

Regression with ARMA errors: MLE

  • MLE of \(\boldsymbol\beta, \boldsymbol\phi, \boldsymbol\theta, \sigma^2\) can be estimated by maximizing the Gaussian likelihood with error covariance \(\boldsymbol\Gamma_n\)

  • An iterative scheme

    1. Compute \(\hat{\boldsymbol\beta}_{\text{OLS}}\) and regression residuals \(\mathbf{Y} - \mathbf{X}\hat{\boldsymbol\beta}_{\text{OLS}}\)

    2. Based on the estimated residuals, compute MLE of the ARMA\((p, q)\) parameters

    3. Based on the fitted ARMA model, compute \(\hat{\boldsymbol\beta}_{\text{GLS}}\)

    4. Compute regression residuals \(\mathbf{Y} - \mathbf{X}\hat{\boldsymbol\beta}_{\text{GLS}}\), and return to Step 2 until estimators stabilize

  • Asymptotic properties of MLE: If \(\{W_t\}\) is a causal and invertible ARMA, then
    • MLEs are asymptotically normal
    • Estimated regression coefficients are asymptotically independent of estimated ARMA parameters

References