Book Notes: Intro to Time Series and Forecasting -- Ch5 ARMA Models Estimation and Forecasting

For the pdf slides, click here

Parameter estimation for ARMA(p,q)

  • When the orders p,q are known, estimate the parameters ϕ=(ϕ1,,ϕp),θ=(θ1,,θq),σ2
    • There are p+q+1 parameters in total
  • Preliminary estimations
    • Yule-Walker and Burg’s algorithm: good for AR(p)
    • Innovation algorithm: good for MA(q)
    • Hannan-Rissanen algorithm: good for ARMA(p,q)
  • More efficient estimation: MLE

  • When the orders p,q are unknown, use model selection methods to select orders
    • Minimize one-step MSE: FPE
    • Penalized likelihood methods: AIC, AICC, BIC

Yule-Walker Estimation

Yule-Walker equations

  • {Xt} is a casual AR(p) process Xt=ϕ1Xt1++ϕpXtp+Zt

  • Multiplying each side by Xt,Xt1,,Xtp, respectively, and taking expectation, we got the Yule-Walker equations σ2=γ(0)ϕ1γ(1)ϕpγ(p) [γ(0)γ(1)γ(p1)γ(1)γ(0)γ(p2)γ(p1)γ(p2)γ(0)]Γp[ϕ1ϕ2ϕp]ϕ=[γ(1)γ(2)γ(p)]γp

  • Vector representation Γpϕ=γp,σ2=γ(0)ϕγp

Yule-Walker estimator and its properties

  • Yule-Walker estimators ϕ^=(ϕ^1,,ϕ^p) are obtained by solving the hatted version of the Yule-Walker equations ϕ^=Γ^p1γ^p,σ^2=γ^(0)ϕ^γ^p

  • The fitted model is causal and σ^20 Xt=ϕ^1Xt1++ϕ^pXtp+Zt,ZtWN(0,σ^2)

  • Asymptotic normality ϕ^N(ϕ,σ2Γp1n)

Yule-Walker estimator is a moment estimator: because it is obtained by equating theoretical and sample moments

  • Usually moment estimators have much higher variance than MLE

  • But Yule-Walker estimators of AR(p) process have the same asymptotic distribution as the MLE

  • Moment estimators can fail for MA(q) and general ARMA

    • For example, MA(1): Xt=Zt+θZt+1 with {Zt}WN(0,σ2). γ(0)=(1+θ2)σ2,γ(1)=θσ2ρ(1)=θ1+θ2 Moment estimator of θ is obtained by solving ρ^(1)=θ^1+θ^2θ^=1±14ρ^(1)22ρ^(1) This can yield complex θ^ if |ρ^(1)|>1/2, which can happen if ρ(1)=1/2, i.e., θ=1

Innovations algorithm: estimate MA coefficients

  • Fitted innovations MA(m) model Xt=Zt+θ^m1Zt1+++θ^mmZtm,{Zt}WN(0,v^m) where θ^m and v^m are from the innovations algorithm with ACVF replaced by the sample ACVF

  • For a MA(q) process, the innovations algorithm estimator θ^q=(θ^q1,,θ^qq) is NOT consistent for (θ1,,θq)

  • Choice of m: increase m until the vector (θ^m1,,θ^mq) stabilizes

Maximum Likelihood Estimation

Likelihood function of a Gaussian time series

  • Suppose {Xt} is a Gaussian time series with mean zero

  • Assume that covariance matrix Γn=E(XnXn) is nonsingular

  • One-step predictors using innovations algorithm: X^1=0 and X^j+1=PjXj+1 with MSE vj=E(Xj+1X^j+1)2

    • Example: AR(1) X^j={0,j=1ϕX^j1j2,vj={σ21ϕ2,j=0σ2j1
  • Likelihood function L|Γn|1/2exp(12XnΓn1Xn)=(v0v1vn1)1/2exp[12j=1n(XjX^j)2vj1]

Maximum likelihood estimation of ARMA(p,q)

  • Innovations MSE vj=σ2rj, where rj depends on ϕ and θ

  • Maximizing the likelihood is equivalent to minimizing 2logL(ϕ,θ,σ2)=nlog(σ2)+j=1nlog(rj1)+S(ϕ,θ)σ2, where S(ϕ,θ)=j=1n(XjX^j)2rj1

  • MLE σ^2 can be expressed with MLE ϕ^,θ^ σ^2=S(ϕ^,θ^)n

  • MLE ϕ^,θ^ are obtained by minimizing log[S(ϕ,θ)n]+1nj=1nlog(rj1) Not depend on σ2!

Asymptotic normality of MLE

  • When n is large, for a causal and invertible ARMA(p,q) process, [ϕ^θ^]Np+1([ϕ^θ^],Vn)

  • For an AR(p) process, MLE has the same asymptotic distribution as the Yule-Walker estimator V=σ2Γp1ϕ^N(ϕ,σ2Γp1n)

Examples of V

  • AR(1) V=1ϕ12

  • AR(2) V=[1ϕ22ϕ1(1+ϕ2)ϕ1(1+ϕ2)1ϕ22]

  • MA(1) V=1θ12

  • MA(2) V=[1θ22θ1(1θ2)θ1(1θ2)1θ22]

  • ARMA(1,1) V=1+ϕθ(ϕ+θ)2[(1ϕ2)(1+ϕθ)(1θ2)(1ϕ2)(1θ2)(1ϕ2)(1ϕ2)(1+ϕθ)]

Order Selection

Order selection

  • Why? Harm of using too large p,q to fit models:

    • Large errors arising from parameter estimation of the model
    • Large MSEs of forecasts
  • FPE: only for AR(p) processes FPE=σ^2n+pnp

  • AIC: for ARMA(p,q); approximate Kullback-Leibler discrepancy of the fitted model and the true model, a penalized likelihood method AIC=2log(L^)+2(p+q+1)

  • AICC: for ARMA(p,q); a bias-corrected version of AIC, a penalized likelihood method AICC=2log(L^)+2(p+q+1)nnpq2

Diagnostic Checking

Residuals and rescaled residuals

  • Residuals of an ARMA(p,q) process W^t=XtX^t(ϕ^,θ^)rt1(ϕ^,θ^),t=1,,n
    • Residuals {W^t} should be similar to white noises {Zt}
  • Rescaled residuals R^t=W^tσ^,σ^=t=1nW^t2n
    • Residuals residuals should be approximately WN(0,1)

Residual diagnostics

  1. Plot {R^t} and look for patterns

  2. Compute the sample ACF of {R^t}
    • It should be close to the WN(0,1) sample ACF
  3. Apply Chapter 1 tests for IID noises

References

  • Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer