Book Notes: Introduction to Time Series and Forecasting -- Ch2 Stationary Processes

For the pdf slides, click here

Best linear predictor

  • Goal: find a function of XnXn that gives the “best” predictor of Xn+hXn+h.

    • We mean “best” by achieving minimum mean squared error
    • Under joint normality assumption of XnXn and Xn+hXn+h, the best estimator is m(Xn)=E(Xn+hXn)=μ+ρ(h)(Xnμ)m(Xn)=E(Xn+hXn)=μ+ρ(h)(Xnμ)
  • Best linear predictor (Xn)=aXn+b(Xn)=aXn+b

    • For Gaussian processes, (Xn)(Xn) and m(Xn)m(Xn) are the same.
    • The best linear predictor only depends on the mean and ACF of the series {Xn}{Xn}

Properties of ACVF γ()γ() and ACF ρ()ρ()

  • γ(0)0γ(0)0

  • |γ(h)|γ(0)|γ(h)|γ(0) for all hh

  • γ(h)γ(h) is a even function, i.e., γ(h)=γ(h)γ(h)=γ(h) for all hh

  • A function κ:NRκ:NR is nonnegative definite if ni,j=1aiκ(ij)aj0ni,j=1aiκ(ij)aj0 for all nN+nN+ and vectors a=(a1,,an)Rna=(a1,,an)Rn

  • Theorem: a real-value function defined on the integers is the autocovariance function of a stationary time series if and only if it is even and nonnegative definite

  • ACF ρ()ρ() has all above properties of ACVF γ()γ()

    • Plus one more: ρ(0)=1ρ(0)=1

MA(q)(q) process, qq-dependent, and qq-correlated

  • A time series {Xt}{Xt} is

    • qq-dependent: if XsXs and XtXt are independent for all |ts|>q|ts|>q.

    • qq-correlated: if ρ(h)=0ρ(h)=0 for all |h|>q|h|>q.

  • Moving-average process of order qq: {Xt}{Xt} is a MA(q)(q) process if Xt=Zt+θ1Zt1++θqZtqXt=Zt+θ1Zt1++θqZtq where {Zt}WN(0,σ2){Zt}WN(0,σ2)

  • A MA(q)(q) process is qq-correlated

  • Theorem: a stationary qq-correlated time series with mean 0 can be represented as a MA(q)(q) process

Linear Processes

Linear processes: definitions

  • A time series {Xt}{Xt} is a linear process if Xt=j=ψjZtjXt=j=ψjZtj where {Zt}WN(0,σ2){Zt}WN(0,σ2), and the constants {ψj}{ψj} satisfy j=|ψj|<j=|ψj|<

  • Equivalent representation using backward shift operator BB Xt=ψ(B)Zt,ψ(B)=j=ψjBjXt=ψ(B)Zt,ψ(B)=j=ψjBj

  • Special case: moving average MA()() Xt=j=0ψjZtjXt=j=0ψjZtj

Linear processes: properties

  • In the linear process Xt=j=ψjZtjXt=j=ψjZtj definition, the condition j=|ψj|<j=|ψj|< ensures

    • The infinite sum XtXt converges with probability 1
    • j=ψ2j<j=ψ2j<, and hence XtXt converges in mean square, i.e., XtXt is the mean square limit of the partial sum nj=nψjZtjnj=nψjZtj

Apply a linear filter to a stationary time series, then the output series is also stationary

  • Theorem: let {Yt}{Yt} be a stationary time series with mean 0 and ACVF γYγY. If j=|ψj|<j=|ψj|<, then the time series Xt=j=ψjYtj=ψ(B)YtXt=j=ψjYtj=ψ(B)Yt is stationary with mean 0 and ACVF γX(h)=j=k=ψjψkγY(h+kj)γX(h)=j=k=ψjψkγY(h+kj)

  • Special case of the above result: If {Xt}{Xt} is a linear process, then its ACVF is γX(h)=j=ψjψj+hσ2γX(h)=j=ψjψj+hσ2

Combine multiple linear filters

  • Linear filters with absoluately summable coefficients α(B)=j=αjBj,β(B)=j=βjBjα(B)=j=αjBj,β(B)=j=βjBj can be applied successively to a stationary series {Yt}{Yt} to generate a new stationary series Wt=j=ψjYtj,ψj=kαkβjk=kβkαjkWt=j=ψjYtj,ψj=kαkβjk=kβkαjk or equivalently, Wt=ψ(B)Yt,ψ(B)=α(B)β(B)=β(B)α(B)Wt=ψ(B)Yt,ψ(B)=α(B)β(B)=β(B)α(B)

AR(1)(1) process XtϕXt1=ZtXtϕXt1=Zt, in linear process formats

  • If |ϕ|<1|ϕ|<1, then Xt=j=0ϕjZtjXt=j=0ϕjZtj

    • Since XtXt only depends on {Zs,st}{Zs,st}, we say {Xt}{Xt} is causal or future-independent
  • If |ϕ|>1|ϕ|>1, then Xt=j=1ϕjZt+jXt=j=1ϕjZt+j

    • This is because Xt=ϕ1Zt+1+ϕ1Xt+1Xt=ϕ1Zt+1+ϕ1Xt+1
    • Since XtXt depends on {Zs,st}{Zs,st}, we say {Xt}{Xt} is noncausal
  • If ϕ=±1ϕ=±1, then there is no stationary linear process solution

Introduction to ARMA Processes

ARMA(1,1)(1,1) process

ARMA(1,1)(1,1) process: definitions

  • The time series {Xt}{Xt} is a ARMA(1,1)(1,1) process if it is stationary and satisfies XtϕXt1=Zt+θZt1XtϕXt1=Zt+θZt1 where {Zt}WN(0,σ2){Zt}WN(0,σ2) and ϕ+θ0ϕ+θ0

  • Equivalent represention using the backward shift operator ϕ(B)Xt=θ(B)Zt,where ϕ(B)=1ϕB, θ(B)=1+θB,ϕ(B)Xt=θ(B)Zt,where ϕ(B)=1ϕB, θ(B)=1+θB,

ARMA(1,1)(1,1) process in linear process format

  • If ϕ±1ϕ±1, by letting χ(z)=1/ϕ(z)χ(z)=1/ϕ(z), we can write an ARMA(1,1)(1,1) as Xt=χ(B)θ(B)Zt=ψ(B)Zt,where ψ(B)=j=ψjBjXt=χ(B)θ(B)Zt=ψ(B)Zt,where ψ(B)=j=ψjBj

    • If |ϕ|<1|ϕ|<1, then χ(z)=j=0ϕjzjχ(z)=j=0ϕjzj, and ψj={0,if j1,1,if j=0,(ϕ+θ)ϕj1,if j1Causalψj=0,if j1,1,if j=0,(ϕ+θ)ϕj1,if j1Causal

    • If |ϕ|>1|ϕ|>1, then χ(z)=1j=ϕjzjχ(z)=1j=ϕjzj, and ψj={(θ+ϕ)ϕj1,if j1,θϕ1,if j=0,0,if j1Noncausalψj=(θ+ϕ)ϕj1,if j1,θϕ1,if j=0,0,if j1Noncausal

  • If ϕ=±1ϕ=±1, then there is no such stationary ARMA(1,1)(1,1) process

Invertibility

  • Invertibility is the dual concept of causaility
    • Causal: XtXt can be expressed by {Zs,st}{Zs,st}
    • Invertible: ZtZt can be expressed by {Xs,st}{Xs,st}
  • For an ARMA(1,1)(1,1) process,
    • If |θ|<1|θ|<1, then it is invertible
    • If |θ|>1|θ|>1, then it is noninvertible

Properties of the Sample ACVF and Sample ACF

Estimation of the series mean μ=E(Xt)μ=E(Xt)

  • The sample mean ˉXn=1nni=1Xi¯Xn=1nni=1Xi is an unbised estimator of μμ

    • Mean squared error E(ˉXnμ)2=1nnh=n(1|h|n)γ(h)E(¯Xnμ)2=1nnh=n(1|h|n)γ(h)
  • Theorem: If {Xt}{Xt} is a stationary time series with mean 0 and ACVF γ()γ(), then as nn, V(ˉXt)=E(ˉXnμ)20,if γ(n)0,V(¯Xt)=E(¯Xnμ)20,if γ(n)0, nE(ˉXnμ)2|h|<γ(h),if h=|γ(h)|<nE(¯Xnμ)2|h|<γ(h),if h=|γ(h)|<

Confidence bounds of μμ

  • If {Xt}{Xt} is Gaussian, then n(ˉXnμ)N(0,|h|<n(1|h|n)γ(h))n(¯Xnμ)N0,|h|<n(1|h|n)γ(h)

  • For many common time series, such as linear and ARMA models, when nn is large, ˉXn¯Xn is approximately normal: ˉXnN(μ,vn),v=|h|<γ(h)¯XnN(μ,vn),v=|h|<γ(h)
    • An approximate 95% confidence interval for μμ is (ˉXn1.96v1/2/n, ˉXn+1.96v1/2/n)(¯Xn1.96v1/2/n, ¯Xn+1.96v1/2/n)
    • To estimate vv, we can use ˆv=|h|<n(1|h|n)ˆγ(h)^v=|h|<n(1|h|n)^γ(h)

Estimation of ACVF γ()γ() and ACF ρ()ρ()

  • Use sample ACVF ˆγ()^γ() and sample ACF ˆρ()^ρ() ˆγ(h)=1nn|h|t=1(Xt+|h|ˉXn)(XtˉXn),ˆρ()=ˆγ(h)/ˆγ(0)^γ(h)=1nn|h|t=1(Xt+|h|¯Xn)(Xt¯Xn),^ρ()=^γ(h)/^γ(0)
    • Even if the factor 1/n1/n is replaced by 1/(nh)1/(nh), they are still biased
    • They are nearly unbiased for large nn
  • When hh is slightly smaller than nn, the estimators ˆγ(),ˆρ()^γ(),^ρ() are unreliable since there are only few pairs of (Xt+h,Xt)(Xt+h,Xt).

    • A useful guide for them to be reliable (by Jenkins): n50,hn/4n50,hn/4

Bartlett’s Formula

Asymptotic distribution of ˆρ()^ρ()

  • For linear models, esp ARMA models, when nn is large, ˆρk=(ˆρ(1),,ˆρ(k)) is approximately normal ˆρkN(ρ,n1W)

  • By Bartlett’s formula, W is the covariance matrix with entries wij=k=1[ρ(k+i)+ρ(ki)2ρ(i)ρ(k)]×[ρ(k+j)+ρ(kj)2ρ(j)ρ(k)]

  • Special cases
    • Marginally, for any j1, ˆρ(j)N(ρ(j),n1wjj)

    • iid noise wij={1,if i=j,0,otherwiseˆρ(k)N(0,1/n), k=1,,n

Forecast Stationary Time Series

Best linear predictor: minimizes MSE

Best linear predictor: definition

  • For a stationary time series {Xt} with known mean μ and ACVF γ, our goal is to find the linear combination of 1,Xn,Xn1,,X1 that forecasts Xn+h with minimum mean squared error

  • Best linear predictor: PnXn+h=a0+a1Xn++anX1=a0+ni=1aiXn+1i

    • We need to find the coefficients a0,a1,,an that minimize E(Xn+ha0a1XnanX1)2
    • We can take partial derivatives and solve a system of equations E[Xn+ha0ni=1aiXn+1i]=0,E[(Xn+ha0ni=1aiXn+1i)Xn+1j]=0, j=1,,n

Best linear predictor: the solution

  • Plugging the solution a0=μ(1ni=1ai) in, the linear pedictor becomes PnXn+h=μ+ni=1ai(Xn+1iμ)

  • The solution of coefficients an=(a1,,an)=Γ1nγn(h)
    • Γn=[γ(ij)]ni,j=1 and γn(h)=(γ(h),γ(h+1),,γ(h+n1))

Best linear predictor ˆXt+h=PnXn+h: properties

  • Unbiasness E(ˆXt+hXt+h)=0

  • Mean squared error (MSE) E(Xt+hˆXt+h)2=E(X2t+h)E(ˆX2t+h)=γ(0)anγn(h)

  • Orthogonality E[(ˆXt+hXt+h)Xj]=0,j=1,,n

    • In general, orthogonality means E[(Error)×(PredictorVariable)]=0

Example: one-step prediction of an AR(1) series

  • We predict Xn+1 from X1,,Xn ˆXn+1=μ+a1(Xnμ)+an(X1μ)

  • The coefficients an=(a1,,an) satisfies [1ϕϕ2ϕn1ϕ1ϕϕn2ϕn1ϕn2ϕn31][a1a2an]=[ϕ1ϕ2ϕn]

  • By guessing, we find a solution (a1,a2,,an)=(ϕ,0,,0), i.e., ˆXn+1=μ+ϕ(Xnμ)

    • Does not depend on Xn1,,X1
    • MSE E(Xt+1ˆXt+1)2=σ2

WOLG, we can assume μ=0 while predicting

  • A stationary time series {Xt} has mean μ

  • To predict its future values, we can first create another time series Yt=Xtμ and predict ˆYn+h=Pn(ˆYn+h) by ˆYn+h=a1Yn++anY1

  • Since ACVF γY(h)=γX(h), the coefficients a1,,an are the same for {Xt} and {Yt}

  • The best linear predictor for ˆXn+h=Pn(ˆXn+h) is ˆXn+hμ=a1(Xnμ)++an(X1μ)

Prediction operator P(W)

  • X and W1,,Wn are random variables with finte 2nd moments

    • Note: W1,,Wn does not need to be stationary
  • Best linear predictor: ˆX=P(XW)=E(X)+a1[WnE(Wn)]++an[W1E(W1)]

  • Coefficients a=(a1,,an) satisfies Γa=γ where Γ=[Cov(Wn+1i,Wn+1j)]ni,j=1 and γ=[Cov(X,Wn),,Cov(X,W1)]

Properties of ˆX=P(XW)

  • Unbiased E(ˆXX)=0

  • Orthogonal E[(ˆXX)Wi]=0 for i=1,n

  • MSE E(ˆXX)2=Var(X)(a1,,an)[Cov(X,Wn)Cov(X,W1)]

  • Linear P(α1X1+α2X2+βW)=α1P(X1W)+α2P(X2W)+β

  • Extreme cases
    • Perfect prediction P(ni=1αiWj+βW)=ni=1αiWj+β
    • Uncorrelated: if Cov(X,Wi)=0 for all i=1,,n, then P(XW)=E(X)

Examples: predictions of AR(p) series

  • A time series {Xt} is an autoregression of order p, i.e., AR(p), if it is stationary and satisfies Xt=ϕ1Xt1+ϕ2Xt2++ϕpXtp+Zt where {Zt}WN(0,σ2), and Cov(Xs,Zt)=0 for all s<t

  • When n>p, the one-step prediction of an AR(p) series is PnXn+1=ϕ1Xn+ϕ2Xn1++ϕpXn+1p with MSE E(Xn+1PnXn+1)2=E(Zn+1)2=σ2

  • h-step prediction of an AR(1) series (proof by recursions) PnXn+h=ϕhXn,MSE=σ21ϕ2h1ϕ2

Recursive methods: the Durbin-Levinson and Innovation Algorithms

Recursive methods for one-step prediction

  • The best linear predictor solution a=Γ1γ needs matrix inversion

  • Alternatively, we can use recursion to simplify one-step prediction of PnXn+1, based on PjXj+1 for j=1,,n1

  • We will introduce
    • Durbin-Levinson algorithms: good for AR(p)
    • Innovation algorithm: good for MA(q); innovations are uncorrelated

Durbin-Levinson algorithm

  • Assume {Xt} is mean zero, stationary, with ACVF γ(h) ˆXn+1=ϕn,1Xn+ϕn,nX1,with MSE vn=E(ˆXn+1Xn+1)2
  1. Start with ˆX1=0 and v0=γ(0)

For n=1,2,, compute step 2-4 successively

  1. Compute ϕn,n (partial autocorrelation function (PACF) at lag n) ϕn,n=[γ(n)n1j=1ϕn1,jγ(nj)]/vn1

  2. Compute ϕn,1,,ϕn,n1 [ϕn,1ϕn,n1]=[ϕn1,1ϕn1,n1]ϕn,n[ϕn1,n1ϕn1,1]

  3. Compute vn vn=vn1(1ϕ2n,n)

Innovation algorithm

  • Assume {Xt} is any mean zero (not necessarily stationary) time series with covariance κ(i,j)=Cov(Xi,Xj)

  • Predict ˆXn+1=PnXn+1 based on innovations, or one-step prediction errors XjˆXj, j=1,,n ˆXn+1=θn,1(XnˆXn)++θn,n(X1ˆX1)with MSE vn

  1. Start with ˆX1=0 and v0=κ(1,1)

For n=1,2,, compute step 2-3 successively

  1. For k=0,1,,n1, compute coefficients θn,nk=[κ(n+1,k+1)k1j=0θk,kjθn,njvj]/vk

  2. Compute the MSE vn=κ(n+1,n+1)n1j=0θ2n,njvj

h-step predictors using innovations

  • For any k1, orthoganlity ensures E[(Xn+kPn+k1Xn+k)Xj]=0,j=1,,n Thus, we have Pn(Xn+kPn+k1Xn+k)=0

  • The h-step prediction: PnXn+h=PnPn+h1Xn+h=Pn[n+h1j=1θn+h1,j(Xn+hjˆXn+hj)]=n+h1j=hθn+h1,j(Xn+hjˆXn+hj)

References

  • Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer