For the pdf slides, click here
Best linear predictor
Goal: find a function of XnXn that gives the “best” predictor of Xn+hXn+h.
- We mean “best” by achieving minimum mean squared error
- Under joint normality assumption of XnXn and Xn+hXn+h, the best estimator is m(Xn)=E(Xn+h∣Xn)=μ+ρ(h)(Xn−μ)m(Xn)=E(Xn+h∣Xn)=μ+ρ(h)(Xn−μ)
Best linear predictor ℓ(Xn)=aXn+bℓ(Xn)=aXn+b
- For Gaussian processes, ℓ(Xn)ℓ(Xn) and m(Xn)m(Xn) are the same.
- The best linear predictor only depends on the mean and ACF of the series {Xn}{Xn}
Properties of ACVF γ(⋅)γ(⋅) and ACF ρ(⋅)ρ(⋅)
γ(0)≥0γ(0)≥0
|γ(h)|≤γ(0)|γ(h)|≤γ(0) for all hh
γ(h)γ(h) is a even function, i.e., γ(h)=γ(−h)γ(h)=γ(−h) for all hh
A function κ:N→Rκ:N→R is nonnegative definite if n∑i,j=1aiκ(i−j)aj≥0n∑i,j=1aiκ(i−j)aj≥0 for all n∈N+n∈N+ and vectors a=(a1,…,an)′∈Rna=(a1,…,an)′∈Rn
Theorem: a real-value function defined on the integers is the autocovariance function of a stationary time series if and only if it is even and nonnegative definite
ACF ρ(⋅)ρ(⋅) has all above properties of ACVF γ(⋅)γ(⋅)
- Plus one more: ρ(0)=1ρ(0)=1
Linear Processes
Linear processes: definitions
A time series {Xt}{Xt} is a linear process if Xt=∞∑j=−∞ψjZt−jXt=∞∑j=−∞ψjZt−j where {Zt}∼WN(0,σ2){Zt}∼WN(0,σ2), and the constants {ψj}{ψj} satisfy ∞∑j=−∞|ψj|<∞∞∑j=−∞|ψj|<∞
Equivalent representation using backward shift operator BB Xt=ψ(B)Zt,ψ(B)=∞∑j=−∞ψjBjXt=ψ(B)Zt,ψ(B)=∞∑j=−∞ψjBj
Special case: moving average MA(∞)(∞) Xt=∞∑j=0ψjZt−jXt=∞∑j=0ψjZt−j
Linear processes: properties
In the linear process Xt=∑∞j=−∞ψjZt−jXt=∑∞j=−∞ψjZt−j definition, the condition ∑∞j=−∞|ψj|<∞∑∞j=−∞|ψj|<∞ ensures
- The infinite sum XtXt converges with probability 1
- ∑∞j=−∞ψ2j<∞∑∞j=−∞ψ2j<∞, and hence XtXt converges in mean square, i.e., XtXt is the mean square limit of the partial sum ∑nj=−nψjZt−j∑nj=−nψjZt−j
Apply a linear filter to a stationary time series, then the output series is also stationary
Theorem: let {Yt}{Yt} be a stationary time series with mean 0 and ACVF γYγY. If ∑∞j=−∞|ψj|<∞∑∞j=−∞|ψj|<∞, then the time series Xt=∞∑j=−∞ψjYt−j=ψ(B)YtXt=∞∑j=−∞ψjYt−j=ψ(B)Yt is stationary with mean 0 and ACVF γX(h)=∞∑j=−∞∞∑k=−∞ψjψkγY(h+k−j)γX(h)=∞∑j=−∞∞∑k=−∞ψjψkγY(h+k−j)
Special case of the above result: If {Xt}{Xt} is a linear process, then its ACVF is γX(h)=∞∑j=−∞ψjψj+hσ2γX(h)=∞∑j=−∞ψjψj+hσ2
Combine multiple linear filters
- Linear filters with absoluately summable coefficients α(B)=∞∑j=−∞αjBj,β(B)=∞∑j=−∞βjBjα(B)=∞∑j=−∞αjBj,β(B)=∞∑j=−∞βjBj can be applied successively to a stationary series {Yt}{Yt} to generate a new stationary series Wt=∞∑j=−∞ψjYt−j,ψj=∞∑k−∞αkβj−k=∞∑k−∞βkαj−kWt=∞∑j=−∞ψjYt−j,ψj=∞∑k−∞αkβj−k=∞∑k−∞βkαj−k or equivalently, Wt=ψ(B)Yt,ψ(B)=α(B)β(B)=β(B)α(B)Wt=ψ(B)Yt,ψ(B)=α(B)β(B)=β(B)α(B)
AR(1)(1) process Xt−ϕXt−1=ZtXt−ϕXt−1=Zt, in linear process formats
If |ϕ|<1|ϕ|<1, then Xt=∞∑j=0ϕjZt−jXt=∞∑j=0ϕjZt−j
- Since XtXt only depends on {Zs,s≤t}{Zs,s≤t}, we say {Xt}{Xt} is causal or future-independent
If |ϕ|>1|ϕ|>1, then Xt=−∞∑j=1ϕ−jZt+jXt=−∞∑j=1ϕ−jZt+j
- This is because Xt=−ϕ−1Zt+1+ϕ−1Xt+1Xt=−ϕ−1Zt+1+ϕ−1Xt+1
- Since XtXt depends on {Zs,s≥t}{Zs,s≥t}, we say {Xt}{Xt} is noncausal
If ϕ=±1ϕ=±1, then there is no stationary linear process solution
Introduction to ARMA Processes
ARMA(1,1)(1,1) process
ARMA(1,1)(1,1) process: definitions
The time series {Xt}{Xt} is a ARMA(1,1)(1,1) process if it is stationary and satisfies Xt−ϕXt−1=Zt+θZt−1Xt−ϕXt−1=Zt+θZt−1 where {Zt}∼WN(0,σ2){Zt}∼WN(0,σ2) and ϕ+θ≠0ϕ+θ≠0
Equivalent represention using the backward shift operator ϕ(B)Xt=θ(B)Zt,where ϕ(B)=1−ϕB, θ(B)=1+θB,ϕ(B)Xt=θ(B)Zt,where ϕ(B)=1−ϕB, θ(B)=1+θB,
ARMA(1,1)(1,1) process in linear process format
If ϕ≠±1ϕ≠±1, by letting χ(z)=1/ϕ(z)χ(z)=1/ϕ(z), we can write an ARMA(1,1)(1,1) as Xt=χ(B)θ(B)Zt=ψ(B)Zt,where ψ(B)=∞∑j=−∞ψjBjXt=χ(B)θ(B)Zt=ψ(B)Zt,where ψ(B)=∞∑j=−∞ψjBj
If |ϕ|<1|ϕ|<1, then χ(z)=∑∞j=0ϕjzjχ(z)=∑∞j=0ϕjzj, and ψj={0,if j≤−1,1,if j=0,(ϕ+θ)ϕj−1,if j≥1Causalψj=⎧⎨⎩0,if j≤−1,1,if j=0,(ϕ+θ)ϕj−1,if j≥1Causal
If |ϕ|>1|ϕ|>1, then χ(z)=−∑−1j=−∞ϕjzjχ(z)=−∑−1j=−∞ϕjzj, and ψj={−(θ+ϕ)ϕj−1,if j≤−1,−θϕ−1,if j=0,0,if j≥1Noncausalψj=⎧⎨⎩−(θ+ϕ)ϕj−1,if j≤−1,−θϕ−1,if j=0,0,if j≥1Noncausal
If ϕ=±1ϕ=±1, then there is no such stationary ARMA(1,1)(1,1) process
Invertibility
- Invertibility is the dual concept of causaility
- Causal: XtXt can be expressed by {Zs,s≤t}{Zs,s≤t}
- Invertible: ZtZt can be expressed by {Xs,s≤t}{Xs,s≤t}
- For an ARMA(1,1)(1,1) process,
- If |θ|<1|θ|<1, then it is invertible
- If |θ|>1|θ|>1, then it is noninvertible
Properties of the Sample ACVF and Sample ACF
Estimation of the series mean μ=E(Xt)μ=E(Xt)
The sample mean ˉXn=1n∑ni=1Xi¯Xn=1n∑ni=1Xi is an unbised estimator of μμ
- Mean squared error E(ˉXn−μ)2=1nn∑h=−n(1−|h|n)γ(h)E(¯Xn−μ)2=1nn∑h=−n(1−|h|n)γ(h)
Theorem: If {Xt}{Xt} is a stationary time series with mean 0 and ACVF γ(⋅)γ(⋅), then as n→∞n→∞, V(ˉXt)=E(ˉXn−μ)2⟶0,if γ(n)→0,V(¯Xt)=E(¯Xn−μ)2⟶0,if γ(n)→0, nE(ˉXn−μ)2⟶∑|h|<∞γ(h),if ∞∑h=−∞|γ(h)|<∞nE(¯Xn−μ)2⟶∑|h|<∞γ(h),if ∞∑h=−∞|γ(h)|<∞
Confidence bounds of μμ
If {Xt}{Xt} is Gaussian, then √n(ˉXn−μ)∼N(0,∑|h|<n(1−|h|n)γ(h))√n(¯Xn−μ)∼N⎛⎝0,∑|h|<n(1−|h|n)γ(h)⎞⎠
- For many common time series, such as linear and ARMA models, when nn is large,
ˉXn¯Xn is approximately normal:
ˉXn∼N(μ,vn),v=∑|h|<∞γ(h)¯Xn∼N(μ,vn),v=∑|h|<∞γ(h)
- An approximate 95% confidence interval for μμ is (ˉXn−1.96v1/2/√n, ˉXn+1.96v1/2/√n)(¯Xn−1.96v1/2/√n, ¯Xn+1.96v1/2/√n)
- To estimate vv, we can use ˆv=∑|h|<√n(1−|h|√n)ˆγ(h)^v=∑|h|<√n(1−|h|√n)^γ(h)
Estimation of ACVF γ(⋅)γ(⋅) and ACF ρ(⋅)ρ(⋅)
- Use sample ACVF ˆγ(⋅)^γ(⋅) and sample ACF ˆρ(⋅)^ρ(⋅)
ˆγ(h)=1nn−|h|∑t=1(Xt+|h|−ˉXn)(Xt−ˉXn),ˆρ(⋅)=ˆγ(h)/ˆγ(0)^γ(h)=1nn−|h|∑t=1(Xt+|h|−¯Xn)(Xt−¯Xn),^ρ(⋅)=^γ(h)/^γ(0)
- Even if the factor 1/n1/n is replaced by 1/(n−h)1/(n−h), they are still biased
- They are nearly unbiased for large nn
When hh is slightly smaller than nn, the estimators ˆγ(⋅),ˆρ(⋅)^γ(⋅),^ρ(⋅) are unreliable since there are only few pairs of (Xt+h,Xt)(Xt+h,Xt).
- A useful guide for them to be reliable (by Jenkins): n≥50,h≤n/4n≥50,h≤n/4
Bartlett’s Formula
Asymptotic distribution of ˆρ(⋅)^ρ(⋅)
For linear models, esp ARMA models, when nn is large, ˆρk=(ˆρ(1),…,ˆρ(k))′ is approximately normal ˆρk∼N(ρ,n−1W)
By Bartlett’s formula, W is the covariance matrix with entries wij=∞∑k=1[ρ(k+i)+ρ(k−i)−2ρ(i)ρ(k)]×[ρ(k+j)+ρ(k−j)−2ρ(j)ρ(k)]
- Special cases
Marginally, for any j≥1, ˆρ(j)∼N(ρ(j),n−1wjj)
iid noise wij={1,if i=j,0,otherwise⟺ˆρ(k)∼N(0,1/n), k=1,…,n
Forecast Stationary Time Series
Best linear predictor: minimizes MSE
Best linear predictor: definition
For a stationary time series {Xt} with known mean μ and ACVF γ, our goal is to find the linear combination of 1,Xn,Xn−1,…,X1 that forecasts Xn+h with minimum mean squared error
Best linear predictor: PnXn+h=a0+a1Xn+⋯+anX1=a0+n∑i=1aiXn+1−i
- We need to find the coefficients a0,a1,…,an that minimize E(Xn+h−a0−a1Xn−⋯−anX1)2
- We can take partial derivatives and solve a system of equations E[Xn+h−a0−n∑i=1aiXn+1−i]=0,E[(Xn+h−a0−n∑i=1aiXn+1−i)Xn+1−j]=0, j=1,…,n
Best linear predictor: the solution
Plugging the solution a0=μ(1−∑ni=1ai) in, the linear pedictor becomes PnXn+h=μ+n∑i=1ai(Xn+1−i−μ)
- The solution of coefficients
an=(a1,…,an)′=Γ−1nγn(h)
- Γn=[γ(i−j)]ni,j=1 and γn(h)=(γ(h),γ(h+1),…,γ(h+n−1))′
Best linear predictor ˆXt+h=PnXn+h: properties
Unbiasness E(ˆXt+h−Xt+h)=0
Mean squared error (MSE) E(Xt+h−ˆXt+h)2=E(X2t+h)−E(ˆX2t+h)=γ(0)−a′nγn(h)
Orthogonality E[(ˆXt+h−Xt+h)Xj]=0,j=1,…,n
- In general, orthogonality means E[(Error)×(PredictorVariable)]=0
Example: one-step prediction of an AR(1) series
We predict Xn+1 from X1,…,Xn ˆXn+1=μ+a1(Xn−μ)+⋯an(X1−μ)
The coefficients an=(a1,…,an)′ satisfies [1ϕϕ2⋯ϕn−1ϕ1ϕ⋯ϕn−2⋮⋮⋮⋮⋮ϕn−1ϕn−2ϕn−3⋯1][a1a2⋮an]=[ϕ1ϕ2⋮ϕn]
By guessing, we find a solution (a1,a2,…,an)=(ϕ,0,…,0), i.e., ˆXn+1=μ+ϕ(Xn−μ)
- Does not depend on Xn−1,…,X1
- MSE E(Xt+1−ˆXt+1)2=σ2
WOLG, we can assume μ=0 while predicting
A stationary time series {Xt} has mean μ
To predict its future values, we can first create another time series Yt=Xt−μ and predict ˆYn+h=Pn(ˆYn+h) by ˆYn+h=a1Yn+⋯+anY1
Since ACVF γY(h)=γX(h), the coefficients a1,…,an are the same for {Xt} and {Yt}
The best linear predictor for ˆXn+h=Pn(ˆXn+h) is ˆXn+h−μ=a1(Xn−μ)+⋯+an(X1−μ)
Prediction operator P(⋅∣W)
X and W1,…,Wn are random variables with finte 2nd moments
- Note: W1,…,Wn does not need to be stationary
Best linear predictor: ˆX=P(X∣W)=E(X)+a1[Wn−E(Wn)]+⋯+an[W1−E(W1)]
Coefficients a=(a1,…,an)′ satisfies Γa=γ where Γ=[Cov(Wn+1−i,Wn+1−j)]ni,j=1 and γ=[Cov(X,Wn),…,Cov(X,W1)]′
Properties of ˆX=P(X∣W)
Unbiased E(ˆX−X)=0
Orthogonal E[(ˆX−X)Wi]=0 for i=1,…n
MSE E(ˆX−X)2=Var(X)−(a1,…,an)[Cov(X,Wn)⋮Cov(X,W1)]
Linear P(α1X1+α2X2+β∣W)=α1P(X1∣W)+α2P(X2∣W)+β
- Extreme cases
- Perfect prediction P(n∑i=1αiWj+β∣W)=n∑i=1αiWj+β
- Uncorrelated: if Cov(X,Wi)=0 for all i=1,…,n, then P(X∣W)=E(X)
Examples: predictions of AR(p) series
A time series {Xt} is an autoregression of order p, i.e., AR(p), if it is stationary and satisfies Xt=ϕ1Xt−1+ϕ2Xt−2+⋯+ϕpXt−p+Zt where {Zt}∼WN(0,σ2), and Cov(Xs,Zt)=0 for all s<t
When n>p, the one-step prediction of an AR(p) series is PnXn+1=ϕ1Xn+ϕ2Xn−1+⋯+ϕpXn+1−p with MSE E(Xn+1−PnXn+1)2=E(Zn+1)2=σ2
h-step prediction of an AR(1) series (proof by recursions) PnXn+h=ϕhXn,MSE=σ21−ϕ2h1−ϕ2
Recursive methods: the Durbin-Levinson and Innovation Algorithms
Recursive methods for one-step prediction
The best linear predictor solution a=Γ−1γ needs matrix inversion
Alternatively, we can use recursion to simplify one-step prediction of PnXn+1, based on PjXj+1 for j=1,…,n−1
- We will introduce
- Durbin-Levinson algorithms: good for AR(p)
- Innovation algorithm: good for MA(q); innovations are uncorrelated
Durbin-Levinson algorithm
- Assume {Xt} is mean zero, stationary, with ACVF γ(h) ˆXn+1=ϕn,1Xn+⋯ϕn,nX1,with MSE vn=E(ˆXn+1−Xn+1)2
- Start with ˆX1=0 and v0=γ(0)
For n=1,2,…, compute step 2-4 successively
Compute ϕn,n (partial autocorrelation function (PACF) at lag n) ϕn,n=[γ(n)−n−1∑j=1ϕn−1,jγ(n−j)]/vn−1
Compute ϕn,1,…,ϕn,n−1 [ϕn,1⋮ϕn,n−1]=[ϕn−1,1⋮ϕn−1,n−1]−ϕn,n[ϕn−1,n−1⋮ϕn−1,1]
Compute vn vn=vn−1(1−ϕ2n,n)
Innovation algorithm
Assume {Xt} is any mean zero (not necessarily stationary) time series with covariance κ(i,j)=Cov(Xi,Xj)
Predict ˆXn+1=PnXn+1 based on innovations, or one-step prediction errors Xj−ˆXj, j=1,…,n ˆXn+1=θn,1(Xn−ˆXn)+⋯+θn,n(X1−ˆX1)with MSE vn
- Start with ˆX1=0 and v0=κ(1,1)
For n=1,2,…, compute step 2-3 successively
For k=0,1,…,n−1, compute coefficients θn,n−k=[κ(n+1,k+1)−k−1∑j=0θk,k−jθn,n−jvj]/vk
Compute the MSE vn=κ(n+1,n+1)−n−1∑j=0θ2n,n−jvj
h-step predictors using innovations
For any k≥1, orthoganlity ensures E[(Xn+k−Pn+k−1Xn+k)Xj]=0,j=1,…,n Thus, we have Pn(Xn+k−Pn+k−1Xn+k)=0
The h-step prediction: PnXn+h=PnPn+h−1Xn+h=Pn[n+h−1∑j=1θn+h−1,j(Xn+h−j−ˆXn+h−j)]=n+h−1∑j=hθn+h−1,j(Xn+h−j−ˆXn+h−j)
References
- Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer