Book Notes: Introduction to Time Series and Forecasting -- Ch2 Stationary Processes

For the pdf slides, click here

Best linear predictor

  • Goal: find a function of Xn that gives the “best” predictor of Xn+h.

    • We mean “best” by achieving minimum mean squared error
    • Under joint normality assumption of Xn and Xn+h, the best estimator is m(Xn)=E(Xn+hXn)=μ+ρ(h)(Xnμ)
  • Best linear predictor (Xn)=aXn+b

    • For Gaussian processes, (Xn) and m(Xn) are the same.
    • The best linear predictor only depends on the mean and ACF of the series {Xn}

Properties of ACVF γ() and ACF ρ()

  • γ(0)0

  • |γ(h)|γ(0) for all h

  • γ(h) is a even function, i.e., γ(h)=γ(h) for all h

  • A function κ:NR is nonnegative definite if i,j=1naiκ(ij)aj0 for all nN+ and vectors a=(a1,,an)Rn

  • Theorem: a real-value function defined on the integers is the autocovariance function of a stationary time series if and only if it is even and nonnegative definite

  • ACF ρ() has all above properties of ACVF γ()

    • Plus one more: ρ(0)=1

MA(q) process, q-dependent, and q-correlated

  • A time series {Xt} is

    • q-dependent: if Xs and Xt are independent for all |ts|>q.

    • q-correlated: if ρ(h)=0 for all |h|>q.

  • Moving-average process of order q: {Xt} is a MA(q) process if Xt=Zt+θ1Zt1++θqZtq where {Zt}WN(0,σ2)

  • A MA(q) process is q-correlated

  • Theorem: a stationary q-correlated time series with mean 0 can be represented as a MA(q) process

Linear Processes

Linear processes: definitions

  • A time series {Xt} is a linear process if Xt=j=ψjZtj where {Zt}WN(0,σ2), and the constants {ψj} satisfy j=|ψj|<

  • Equivalent representation using backward shift operator B Xt=ψ(B)Zt,ψ(B)=j=ψjBj

  • Special case: moving average MA() Xt=j=0ψjZtj

Linear processes: properties

  • In the linear process Xt=j=ψjZtj definition, the condition j=|ψj|< ensures

    • The infinite sum Xt converges with probability 1
    • j=ψj2<, and hence Xt converges in mean square, i.e., Xt is the mean square limit of the partial sum j=nnψjZtj

Apply a linear filter to a stationary time series, then the output series is also stationary

  • Theorem: let {Yt} be a stationary time series with mean 0 and ACVF γY. If j=|ψj|<, then the time series Xt=j=ψjYtj=ψ(B)Yt is stationary with mean 0 and ACVF γX(h)=j=k=ψjψkγY(h+kj)

  • Special case of the above result: If {Xt} is a linear process, then its ACVF is γX(h)=j=ψjψj+hσ2

Combine multiple linear filters

  • Linear filters with absoluately summable coefficients α(B)=j=αjBj,β(B)=j=βjBj can be applied successively to a stationary series {Yt} to generate a new stationary series Wt=j=ψjYtj,ψj=kαkβjk=kβkαjk or equivalently, Wt=ψ(B)Yt,ψ(B)=α(B)β(B)=β(B)α(B)

AR(1) process XtϕXt1=Zt, in linear process formats

  • If |ϕ|<1, then Xt=j=0ϕjZtj

    • Since Xt only depends on {Zs,st}, we say {Xt} is causal or future-independent
  • If |ϕ|>1, then Xt=j=1ϕjZt+j

    • This is because Xt=ϕ1Zt+1+ϕ1Xt+1
    • Since Xt depends on {Zs,st}, we say {Xt} is noncausal
  • If ϕ=±1, then there is no stationary linear process solution

Introduction to ARMA Processes

ARMA(1,1) process

ARMA(1,1) process: definitions

  • The time series {Xt} is a ARMA(1,1) process if it is stationary and satisfies XtϕXt1=Zt+θZt1 where {Zt}WN(0,σ2) and ϕ+θ0

  • Equivalent represention using the backward shift operator ϕ(B)Xt=θ(B)Zt,where ϕ(B)=1ϕB, θ(B)=1+θB,

ARMA(1,1) process in linear process format

  • If ϕ±1, by letting χ(z)=1/ϕ(z), we can write an ARMA(1,1) as Xt=χ(B)θ(B)Zt=ψ(B)Zt,where ψ(B)=j=ψjBj

    • If |ϕ|<1, then χ(z)=j=0ϕjzj, and ψj={0,if j1,1,if j=0,(ϕ+θ)ϕj1,if j1Causal

    • If |ϕ|>1, then χ(z)=j=1ϕjzj, and ψj={(θ+ϕ)ϕj1,if j1,θϕ1,if j=0,0,if j1Noncausal

  • If ϕ=±1, then there is no such stationary ARMA(1,1) process

Invertibility

  • Invertibility is the dual concept of causaility
    • Causal: Xt can be expressed by {Zs,st}
    • Invertible: Zt can be expressed by {Xs,st}
  • For an ARMA(1,1) process,
    • If |θ|<1, then it is invertible
    • If |θ|>1, then it is noninvertible

Properties of the Sample ACVF and Sample ACF

Estimation of the series mean μ=E(Xt)

  • The sample mean X¯n=1ni=1nXi is an unbised estimator of μ

    • Mean squared error E(X¯nμ)2=1nh=nn(1|h|n)γ(h)
  • Theorem: If {Xt} is a stationary time series with mean 0 and ACVF γ(), then as n, V(X¯t)=E(X¯nμ)20,if γ(n)0, nE(X¯nμ)2|h|<γ(h),if h=|γ(h)|<

Confidence bounds of μ

  • If {Xt} is Gaussian, then n(X¯nμ)N(0,|h|<n(1|h|n)γ(h))

  • For many common time series, such as linear and ARMA models, when n is large, X¯n is approximately normal: X¯nN(μ,vn),v=|h|<γ(h)
    • An approximate 95% confidence interval for μ is (X¯n1.96v1/2/n, X¯n+1.96v1/2/n)
    • To estimate v, we can use v^=|h|<n(1|h|n)γ^(h)

Estimation of ACVF γ() and ACF ρ()

  • Use sample ACVF γ^() and sample ACF ρ^() γ^(h)=1nt=1n|h|(Xt+|h|X¯n)(XtX¯n),ρ^()=γ^(h)/γ^(0)
    • Even if the factor 1/n is replaced by 1/(nh), they are still biased
    • They are nearly unbiased for large n
  • When h is slightly smaller than n, the estimators γ^(),ρ^() are unreliable since there are only few pairs of (Xt+h,Xt).

    • A useful guide for them to be reliable (by Jenkins): n50,hn/4

Bartlett’s Formula

Asymptotic distribution of ρ^()

  • For linear models, esp ARMA models, when n is large, ρ^k=(ρ^(1),,ρ^(k)) is approximately normal ρ^kN(ρ,n1W)

  • By Bartlett’s formula, W is the covariance matrix with entries wij=k=1[ρ(k+i)+ρ(ki)2ρ(i)ρ(k)]×[ρ(k+j)+ρ(kj)2ρ(j)ρ(k)]

  • Special cases
    • Marginally, for any j1, ρ^(j)N(ρ(j),n1wjj)

    • iid noise wij={1,if i=j,0,otherwiseρ^(k)N(0,1/n), k=1,,n

Forecast Stationary Time Series

Best linear predictor: minimizes MSE

Best linear predictor: definition

  • For a stationary time series {Xt} with known mean μ and ACVF γ, our goal is to find the linear combination of 1,Xn,Xn1,,X1 that forecasts Xn+h with minimum mean squared error

  • Best linear predictor: PnXn+h=a0+a1Xn++anX1=a0+i=1naiXn+1i

    • We need to find the coefficients a0,a1,,an that minimize E(Xn+ha0a1XnanX1)2
    • We can take partial derivatives and solve a system of equations E[Xn+ha0i=1naiXn+1i]=0,E[(Xn+ha0i=1naiXn+1i)Xn+1j]=0, j=1,,n

Best linear predictor: the solution

  • Plugging the solution a0=μ(1i=1nai) in, the linear pedictor becomes PnXn+h=μ+i=1nai(Xn+1iμ)

  • The solution of coefficients an=(a1,,an)=Γn1γn(h)
    • Γn=[γ(ij)]i,j=1n and γn(h)=(γ(h),γ(h+1),,γ(h+n1))

Best linear predictor X^t+h=PnXn+h: properties

  • Unbiasness E(X^t+hXt+h)=0

  • Mean squared error (MSE) E(Xt+hX^t+h)2=E(Xt+h2)E(X^t+h2)=γ(0)anγn(h)

  • Orthogonality E[(X^t+hXt+h)Xj]=0,j=1,,n

    • In general, orthogonality means E[(Error)×(PredictorVariable)]=0

Example: one-step prediction of an AR(1) series

  • We predict Xn+1 from X1,,Xn X^n+1=μ+a1(Xnμ)+an(X1μ)

  • The coefficients an=(a1,,an) satisfies [1ϕϕ2ϕn1ϕ1ϕϕn2ϕn1ϕn2ϕn31][a1a2an]=[ϕ1ϕ2ϕn]

  • By guessing, we find a solution (a1,a2,,an)=(ϕ,0,,0), i.e., X^n+1=μ+ϕ(Xnμ)

    • Does not depend on Xn1,,X1
    • MSE E(Xt+1X^t+1)2=σ2

WOLG, we can assume μ=0 while predicting

  • A stationary time series {Xt} has mean μ

  • To predict its future values, we can first create another time series Yt=Xtμ and predict Y^n+h=Pn(Y^n+h) by Y^n+h=a1Yn++anY1

  • Since ACVF γY(h)=γX(h), the coefficients a1,,an are the same for {Xt} and {Yt}

  • The best linear predictor for X^n+h=Pn(X^n+h) is X^n+hμ=a1(Xnμ)++an(X1μ)

Prediction operator P(W)

  • X and W1,,Wn are random variables with finte 2nd moments

    • Note: W1,,Wn does not need to be stationary
  • Best linear predictor: X^=P(XW)=E(X)+a1[WnE(Wn)]++an[W1E(W1)]

  • Coefficients a=(a1,,an) satisfies Γa=γ where Γ=[Cov(Wn+1i,Wn+1j)]i,j=1n and γ=[Cov(X,Wn),,Cov(X,W1)]

Properties of X^=P(XW)

  • Unbiased E(X^X)=0

  • Orthogonal E[(X^X)Wi]=0 for i=1,n

  • MSE E(X^X)2=Var(X)(a1,,an)[Cov(X,Wn)Cov(X,W1)]

  • Linear P(α1X1+α2X2+βW)=α1P(X1W)+α2P(X2W)+β

  • Extreme cases
    • Perfect prediction P(i=1nαiWj+βW)=i=1nαiWj+β
    • Uncorrelated: if Cov(X,Wi)=0 for all i=1,,n, then P(XW)=E(X)

Examples: predictions of AR(p) series

  • A time series {Xt} is an autoregression of order p, i.e., AR(p), if it is stationary and satisfies Xt=ϕ1Xt1+ϕ2Xt2++ϕpXtp+Zt where {Zt}WN(0,σ2), and Cov(Xs,Zt)=0 for all s<t

  • When n>p, the one-step prediction of an AR(p) series is PnXn+1=ϕ1Xn+ϕ2Xn1++ϕpXn+1p with MSE E(Xn+1PnXn+1)2=E(Zn+1)2=σ2

  • h-step prediction of an AR(1) series (proof by recursions) PnXn+h=ϕhXn,MSE=σ21ϕ2h1ϕ2

Recursive methods: the Durbin-Levinson and Innovation Algorithms

Recursive methods for one-step prediction

  • The best linear predictor solution a=Γ1γ needs matrix inversion

  • Alternatively, we can use recursion to simplify one-step prediction of PnXn+1, based on PjXj+1 for j=1,,n1

  • We will introduce
    • Durbin-Levinson algorithms: good for AR(p)
    • Innovation algorithm: good for MA(q); innovations are uncorrelated

Durbin-Levinson algorithm

  • Assume {Xt} is mean zero, stationary, with ACVF γ(h) X^n+1=ϕn,1Xn+ϕn,nX1,with MSE vn=E(X^n+1Xn+1)2
  1. Start with X^1=0 and v0=γ(0)

For n=1,2,, compute step 2-4 successively

  1. Compute ϕn,n (partial autocorrelation function (PACF) at lag n) ϕn,n=[γ(n)j=1n1ϕn1,jγ(nj)]/vn1

  2. Compute ϕn,1,,ϕn,n1 [ϕn,1ϕn,n1]=[ϕn1,1ϕn1,n1]ϕn,n[ϕn1,n1ϕn1,1]

  3. Compute vn vn=vn1(1ϕn,n2)

Innovation algorithm

  • Assume {Xt} is any mean zero (not necessarily stationary) time series with covariance κ(i,j)=Cov(Xi,Xj)

  • Predict X^n+1=PnXn+1 based on innovations, or one-step prediction errors XjX^j, j=1,,n X^n+1=θn,1(XnX^n)++θn,n(X1X^1)with MSE vn

  1. Start with X^1=0 and v0=κ(1,1)

For n=1,2,, compute step 2-3 successively

  1. For k=0,1,,n1, compute coefficients θn,nk=[κ(n+1,k+1)j=0k1θk,kjθn,njvj]/vk

  2. Compute the MSE vn=κ(n+1,n+1)j=0n1θn,nj2vj

h-step predictors using innovations

  • For any k1, orthoganlity ensures E[(Xn+kPn+k1Xn+k)Xj]=0,j=1,,n Thus, we have Pn(Xn+kPn+k1Xn+k)=0

  • The h-step prediction: PnXn+h=PnPn+h1Xn+h=Pn[j=1n+h1θn+h1,j(Xn+hjX^n+hj)]=j=hn+h1θn+h1,j(Xn+hjX^n+hj)

References

  • Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer