For the pdf slides, click here

Parameter estimation for ARMA $(p, q)$

When the orders $p, q$ are known, estimate the parameters $ϕ = (ϕ_{1}, \dots, ϕ_{p}), θ = (θ_{1}, \dots, θ_{q}), σ^{2}$
- There are $p + q + 1$ parameters in total
Preliminary estimations
- Yule-Walker and Burg’s algorithm: good for AR $(p)$
- Innovation algorithm: good for MA $(q)$
- Hannan-Rissanen algorithm: good for ARMA $(p, q)$
More efficient estimation: MLE
When the orders $p, q$ are unknown, use model selection methods to select orders
- Minimize one-step MSE: FPE
- Penalized likelihood methods: AIC, AICC, BIC

Yule-Walker Estimation

Yule-Walker equations

${X_{t}}$ is a casual AR $(p)$ process $X_{t} = ϕ_{1} X_{t - 1} + \dots + ϕ_{p} X_{t - p} + Z_{t}$
Multiplying each side by $X_{t}, X_{t - 1}, \dots, X_{t - p}$ , respectively, and taking expectation, we got the Yule-Walker equations $σ^{2} = γ (0) - ϕ_{1} γ (1) - \dots ϕ_{p} γ (p)$ $\underset{Γ_{p}}{\underset{⏟}{[\begin{array}{cccc} γ (0) & γ (1) & \dots & γ (p - 1) \\ γ (1) & γ (0) & \dots & γ (p - 2) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ γ (p - 1) & γ (p - 2) & \dots & γ (0) \end{array}]}} \underset{ϕ}{\underset{⏟}{[\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ ⋮ \\ ϕ_{p} \end{matrix}]}} = \underset{γ_{p}}{\underset{⏟}{[\begin{matrix} γ (1) \\ γ (2) \\ ⋮ \\ γ (p) \end{matrix}]}}$
Vector representation $Γ_{p} ϕ = γ_{p}, σ^{2} = γ (0) - ϕ^{'} γ_{p}$

Yule-Walker estimator and its properties

Yule-Walker estimators $\hat{ϕ} = ({\hat{ϕ}}_{1}, \dots, {\hat{ϕ}}_{p})$ are obtained by solving the hatted version of the Yule-Walker equations $\hat{ϕ} = {\hat{Γ}}_{p}^{- 1} {\hat{γ}}_{p}, {\hat{σ}}^{2} = \hat{γ} (0) - {\hat{ϕ}}^{'} {\hat{γ}}_{p}$
The fitted model is causal and ${\hat{σ}}^{2} \geq 0$ $X_{t} = {\hat{ϕ}}_{1} X_{t - 1} + \dots + {\hat{ϕ}}_{p} X_{t - p} + Z_{t}, Z_{t} \sim WN (0, {\hat{σ}}^{2})$
Asymptotic normality $\hat{ϕ} \overset{\cdot}{\sim} N (ϕ, \frac{σ^{2} Γ_{p}^{- 1}}{n})$

Yule-Walker estimator is a moment estimator: because it is obtained by equating theoretical and sample moments

Usually moment estimators have much higher variance than MLE
But Yule-Walker estimators of AR $(p)$ process have the same asymptotic distribution as the MLE
Moment estimators can fail for MA $(q)$ and general ARMA
- For example, MA $(1)$ : $X_{t} = Z_{t} + θ Z_{t + 1}$ with ${Z_{t}} \sim WN (0, σ^{2})$ . $γ (0) = (1 + θ^{2}) σ^{2}, γ (1) = θ σ^{2} ⟹ ρ (1) = \frac{θ}{1 + θ^{2}}$ Moment estimator of $θ$ is obtained by solving $\hat{ρ} (1) = \frac{\hat{θ}}{1 + {\hat{θ}}^{2}} ⟹ \hat{θ} = \frac{1 \pm \sqrt{1 - 4 \hat{ρ} (1)^{2}}}{2 \hat{ρ} (1)}$ This can yield complex $\hat{θ}$ if $| \hat{ρ} (1) | > 1 / 2$ , which can happen if $ρ (1) = 1 / 2$ , i.e., $θ = 1$

Innovations algorithm: estimate MA coefficients

Fitted innovations MA $(m)$ model $X_{t} = Z_{t} + {\hat{θ}}_{m 1} Z_{t - 1} + \dots + \dots + {\hat{θ}}_{m m} Z_{t - m}, {Z_{t}} \sim WN (0, {\hat{v}}_{m})$ where ${\hat{θ}}_{m}$ and ${\hat{v}}_{m}$ are from the innovations algorithm with ACVF replaced by the sample ACVF
For a MA $(q)$ process, the innovations algorithm estimator ${\hat{θ}}_{q} = ({\hat{θ}}_{q 1}, \dots, {\hat{θ}}_{q q})^{'}$ is NOT consistent for $(θ_{1}, \dots, θ_{q})^{'}$
Choice of $m$ : increase $m$ until the vector $({\hat{θ}}_{m 1}, \dots, {\hat{θ}}_{m q})^{'}$ stabilizes

Maximum Likelihood Estimation

Likelihood function of a Gaussian time series

Suppose ${X_{t}}$ is a Gaussian time series with mean zero
Assume that covariance matrix $Γ_{n} = E (X_{n} X_{n}^{'})$ is nonsingular
One-step predictors using innovations algorithm: ${\hat{X}}_{1} = 0$ and ${\hat{X}}_{j + 1} = P_{j} X_{j + 1}$ with MSE $v_{j} = E {(X_{j + 1} - {\hat{X}}_{j + 1})}^{2}$
- Example: AR $(1)$ ${\hat{X}}_{j} = {\begin{cases} 0, & j = 1 \\ ϕ {\hat{X}}_{j - 1} & j \geq 2 \end{cases}, v_{j} = {\begin{cases} \frac{σ^{2}}{1 - ϕ^{2}}, & j = 0 \\ σ^{2} & j \geq 1 \end{cases}$
Likelihood function $\begin{aligned} L & \propto {| Γ_{n} |}^{- 1 / 2} \exp (- \frac{1}{2} X_{n}^{'} Γ_{n}^{- 1} X_{n}) \\ = {(v_{0} v_{1} \dots v_{n - 1})}^{- 1 / 2} \exp [- \frac{1}{2} \sum_{j = 1}^{n} \frac{(X_{j} - {\hat{X}}_{j})^{2}}{v_{j - 1}}] \end{aligned}$

Maximum likelihood estimation of ARMA $(p, q)$

Innovations MSE $v_{j} = σ^{2} r_{j}$ , where $r_{j}$ depends on $ϕ$ and $θ$
Maximizing the likelihood is equivalent to minimizing $- 2 \log L (ϕ, θ, σ^{2}) = n \log (σ^{2}) + \sum_{j = 1}^{n} \log (r_{j - 1}) + \frac{S (ϕ, θ)}{σ^{2}},$ where $S (ϕ, θ) = \sum_{j = 1}^{n} \frac{(X_{j} - {\hat{X}}_{j})^{2}}{r_{j - 1}}$
MLE ${\hat{σ}}^{2}$ can be expressed with MLE $\hat{ϕ}, \hat{θ}$ ${\hat{σ}}^{2} = \frac{S (\hat{ϕ}, \hat{θ})}{n}$
MLE $\hat{ϕ}, \hat{θ}$ are obtained by minimizing $\log [\frac{S (ϕ, θ)}{n}] + \frac{1}{n} \sum_{j = 1}^{n} \log (r_{j - 1})$ Not depend on $σ^{2}$ !

Asymptotic normality of MLE

When $n$ is large, for a causal and invertible ARMA $(p, q)$ process, $[\begin{matrix} \hat{ϕ} \\ \hat{θ} \end{matrix}] \overset{\cdot}{\sim} N_{p + 1} ([\begin{matrix} \hat{ϕ} \\ \hat{θ} \end{matrix}], \frac{V}{n})$
For an AR $(p)$ process, MLE has the same asymptotic distribution as the Yule-Walker estimator $V = σ^{2} Γ_{p}^{- 1} ⟹ \hat{ϕ} \overset{\cdot}{\sim} N (ϕ, \frac{σ^{2} Γ_{p}^{- 1}}{n})$

Examples of $V$

AR $(1)$ $V = 1 - ϕ_{1}^{2}$
AR $(2)$ $V = [\begin{array}{cc} 1 - ϕ_{2}^{2} & - ϕ_{1} (1 + ϕ_{2}) \\ - ϕ_{1} (1 + ϕ_{2}) & 1 - ϕ_{2}^{2} \end{array}]$
MA $(1)$ $V = 1 - θ_{1}^{2}$
MA $(2)$ $V = [\begin{array}{cc} 1 - θ_{2}^{2} & θ_{1} (1 - θ_{2}) \\ θ_{1} (1 - θ_{2}) & 1 - θ_{2}^{2} \end{array}]$
ARMA $(1, 1)$ $V = \frac{1 + ϕ θ}{(ϕ + θ)^{2}} [\begin{array}{cc} (1 - ϕ^{2}) (1 + ϕ θ) & - (1 - θ^{2}) (1 - ϕ^{2}) \\ - (1 - θ^{2}) (1 - ϕ^{2}) & (1 - ϕ^{2}) (1 + ϕ θ) \end{array}]$

Order Selection

Order selection

Why? Harm of using too large $p, q$ to fit models:
- Large errors arising from parameter estimation of the model
- Large MSEs of forecasts
FPE: only for AR $(p)$ processes $FPE = {\hat{σ}}^{2} \frac{n + p}{n - p}$
AIC: for ARMA $(p, q)$ ; approximate Kullback-Leibler discrepancy of the fitted model and the true model, a penalized likelihood method $AIC = - 2 \log (\hat{L}) + 2 (p + q + 1)$
AICC: for ARMA $(p, q)$ ; a bias-corrected version of AIC, a penalized likelihood method $AICC = - 2 \log (\hat{L}) + 2 (p + q + 1) \cdot \frac{n}{n - p - q - 2}$

Diagnostic Checking

Residuals and rescaled residuals

Residuals of an ARMA $(p, q)$ process ${\hat{W}}_{t} = \frac{X_{t} - {\hat{X}}_{t} (\hat{ϕ}, \hat{θ})}{\sqrt{r_{t - 1} (\hat{ϕ}, \hat{θ})}}, t = 1, \dots, n$
- Residuals ${{\hat{W}}_{t}}$ should be similar to white noises ${Z_{t}}$
Rescaled residuals ${\hat{R}}_{t} = \frac{{\hat{W}}_{t}}{\hat{σ}}, \hat{σ} = \sqrt{\frac{\sum_{t = 1}^{n} {\hat{W}}_{t}^{2}}{n}}$
- Residuals residuals should be approximately $WN (0, 1)$

Residual diagnostics

Plot ${{\hat{R}}_{t}}$ and look for patterns
Compute the sample ACF of ${{\hat{R}}_{t}}$
- It should be close to the $WN (0, 1)$ sample ACF
Apply Chapter 1 tests for IID noises

References

Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer

Book Notes: Intro to Time Series and Forecasting -- Ch5 ARMA Models Estimation and Forecasting

Parameter estimation for ARMA(p,q)(p,q)

Yule-Walker Estimation

Yule-Walker equations

Yule-Walker estimator and its properties

Yule-Walker estimator is a moment estimator: because it is obtained by equating theoretical and sample moments

Innovations algorithm: estimate MA coefficients

Maximum Likelihood Estimation

Likelihood function of a Gaussian time series

Maximum likelihood estimation of ARMA(p,q)(p,q)

Asymptotic normality of MLE

Examples of VV

Order Selection

Order selection

Diagnostic Checking

Residuals and rescaled residuals

Residual diagnostics

References

Parameter estimation for ARMA $(p, q)$

Maximum likelihood estimation of ARMA $(p, q)$

Examples of $V$