Book Notes: Introduction to Time Series and Forecasting -- Ch1 Introduction

For the pdf slides, click here

Objective of time series models

  • Seasonal adjustment: recognize seasonal components and remove them to study long-term trends

  • Separate (or filter) noise from signals

  • Prediction

  • Test hypotheses

  • Predicting one series from observations of another

A general approach to time series modeling

  1. Plot the series and check main features:
    • Trend
    • Seasonality
    • Any sharp changes
    • Outliers
  2. Remove trend and seasonal components to get stationary residuals
    • May need data transformation first
  3. Choose a model to fit the residuals

Stationary Models and Autocorrelation Function

Definitions: stationary

  • Series \(\{X_t\}\) has
    • Mean function \(\mu_X(t) = E(X_t)\) and
    • Covariance function \(\gamma_X(r, s) = \textrm{Cov}(X_r, X_s)\)
  • \(\{X_t\}\) is (weakly) stationary if
    • \(\mu_X(t)\) does not depend on \(t\)
    • \(\gamma_X(t+h, t)\) does not depend on \(t\), for each \(h\)
    • (Weakly) stationary is defined based on the first and second order properties of a series
  • \(\{X_t\}\) is strictly stationary if \((X_1, \ldots, X_n)\) and \((X_{1+h}, \ldots, X_{n+h})\) have the same joint distributions for all integers \(h\) and \(n>0\)
    • If \(\{X_t\}\) is strictly stationary, and \(E(X_t^2) < \infty\) for all \(t\), then \(\{X_t\}\) is weakly stationary
    • Weakly stationary does not imply strictly stationary

Definitions: autocovariance and autorrelation

  • \(\{X_t\}\) is a stationary time series

  • Autocovariance function (ACVF) of at lag \(h\)

\[ \gamma_X(h) = \textrm{Cov}(X_{t+h}, X_t) \]

  • Autocorrelation function (ACF) of at lag \(h\)

\[ \rho_X(h) = \frac{\gamma_X(h)}{\gamma_X(0)} = \textrm{Cor}(X_{t+h}, X_t) \]

  • Note that \(\gamma(h) = \gamma(-h)\) and \(\rho(h) = \rho(-h)\)

Definitions: sample ACVF and sample ACF

\(x_1, \ldots, x_n\) are observations of a time series with sample mean \(\bar{x}\)

  • Sample autocovariance function: for \(-n < h < n\), \[ \hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n - |h|} \left(x_{t + |h|} - \bar{x} \right) \left(x_{t} - \bar{x} \right) \]

    • Use \(n\) in the denominator ensures the sample covariance matrix \(\hat{\Gamma}_n = \left[ \hat{\gamma}(i-j) \right]_{i,j = 1}^n\) is nonnegative definite
  • Sample autocorrelation function: for \(-n < h < n\), \[ \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} \]
    • Sample correlation matrix \(\hat{R}_n = \left[ \hat{\rho}(i-j) \right]_{i,j = 1}^n\) is also nonnegative definite

Examples of Simple Time Series Models

iid noise and white noise

  • White noise: uncorrelated, with zero mean and variance \(\sigma^2\)

\[ \{X_t\} \sim \textrm{WN}(0, \sigma^2) \]

  • IID\((0, \sigma^2)\) sequences is \(\text{WN}(0, \sigma^2)\), but not conversely

Binary process and random walk

  • Binary process: an example of iid noise \(\{X_t, t = 1, 2, \ldots \}\) \[ P(X_t = 1) = p, \quad P(X_t = -1) = 1-p \]

  • Random walk: \(\{S_t, t = 0, 1, 2, \ldots\}\), with \(S_0 = 0\) and iid noise \(\{X_t\}\) \[ S_t = X_1 + X_2 + \cdots + X_t, \textrm{ for } t = 1, 2, \ldots \]

    • \(\{S_t\}\) is a simple symmetric random walk if \(\{X_t\}\) is a binary process with \(p = 0.5\)

    • Random walk is not stationary: if \(\textrm{Var}(X_t) = \sigma^2\), then \(\gamma_S(t+h, t) = t \sigma^2\) depends on \(t\)

First-order moving average, MA\((1)\) process

Let \(\{Z_t\} \sim \textrm{WN}(0, \sigma^2)\), and \(\theta \in \mathbb{R}\), then \(\{X_t\}\) is a MA\((1)\) process: \[ X_t = Z_t + \theta Z_{t-1}, \quad t = 0, \pm 1, \ldots \]

  • ACVF: does not depend on \(t\), stationary \[ \gamma_X(t+h, t) = \begin{cases} (1 + \theta^2) \sigma^2, & \textrm{ if } h = 0,\\ \theta \sigma^2, & \textrm{ if } h = \pm 1,\\ 0, & \textrm{ if } |h| > 1.\\ \end{cases} \]

  • ACF: \[ \rho_X(h) = \begin{cases} 1, & \textrm{ if } h = 0,\\ \theta / (1 + \theta^2), & \textrm{ if } h = \pm 1,\\ 0, & \textrm{ if } |h| > 1.\\ \end{cases} \]

First-order autoregression, AR\((1)\) process

Let \(\{Z_t\} \sim \textrm{WN}(0, \sigma^2)\), and \(|\phi| < 1\), then \(\{X_t\}\) is a AR\((1)\) process: \[ X_t = \phi X_{t-1} + Z_t, \quad t = 0, \pm 1, \ldots \]

  • ACVF: \[ \gamma_X(h) = \frac{\sigma^2}{1-\phi^2} \cdot \phi^{|h|} \]

  • ACF: \[ \rho_X(h) = \phi^{|h|} \]

Estimate and Eliminate Trend and Seasonal Components

Classcial decomposition

Observation \(\{X_t\}\) can be decomposed into

  • a (slowly changing) trend component \(m_t\),
  • a seasonal component \(s_t\) with period \(d\) and \(\sum_{j=1}^d s_j = 0\),
  • a zero-mean series \(Y_t\) \[ X_t = m_t + s_t + Y_t \]

  • Method 1: estimate \(s_t\) first, then \(m_t\), and hope the noise component \(Y_t\) is stationary (to model)

  • Method 2: differencing

  • Method 3: trend and seasonality can be estimated together in a regression, whose design matrix contains both polynomial and harmonic terms

Trend Component Only

Estimate trend: polynomial regression fitting

Observation \(\{X_t\}\) can be decomposed into a trend component \(m_t\) and a zero-mean series \(Y_t\): \[ X_t = m_t + Y_t \]

  • Least squares polynomial regression \[ m_t = a_0 + a_1 t + \cdots + a_p t^p \]

Estimate trend: smoothing with a finite MA filter

  • Linear filter \[ \hat{m}_t = \sum_{j = -\infty}^{\infty} a_j X_{t-j} \]

  • Two-sided moving average filter, with \(q \in \mathbb{N}\) \[ W_t = \frac{\sum_{j = -q}^q X_{t-j}}{2q + 1} \]

    • \(W_t \approx m_t\) for \(q+1 \leq t \leq n-q\), if \(X_t\) only has the trend component \(m_t\) but not seasonality \(s_t\), and \(m_t\) is approximately linear in \(t\)

    • \(W_t\) is a low-pass filter: remove the rapidly fluctuating (high frequency) component \(Y_t\), and let the slowly varying component \(m_t\) pass

Estimate trend: exponential smoothing

For any fixed \(\alpha \in [0, 1]\), the one-sided MA \(\hat{m}_t: t = 1, \ldots, n\) defined by recursions \[ \hat{m}_t = \begin{cases} X_1, & \textrm{ if } t = 1 \\ \alpha X_t + (1-\alpha) \hat{m}_{t-1}, & \textrm{ if } t = 2, \ldots, n\\ \end{cases} \]

  • Equivalently, \[ \hat{m}_t = \sum_{j=0}^{t-2} \alpha (1-\alpha)^j X_{t-j} + (1-\alpha)^{t-1}X_1 \]

Eliminate trend by differencing

  • Backward shift operator \[ B X_t = X_{t-1} \]

  • Lag-1 difference operator \[ \nabla X_t = X_t - X_{t-1} = (1 - B) X_t \]
    • If \(\nabla\) is applied to a linear trend function \(m_t = c_0 + c_1 t\), then \(\nabla m_t = c_1\)
  • Powers of operators \(B\) and \(\nabla\): \[ B^j (X_t) = X_{t-j}, \quad \nabla^j(X_t) = \nabla\left[\nabla^{j-1}(X_t)\right] \textrm{ with } \nabla^0(X_t) = X_t \]
    • \(\nabla^k\) reduces a polynomial trend of degree \(k\) to a constant \[ \nabla^k \left( \sum_{j=0}^k c_j t^j \right) = k! c_k \]

Also with the Seasonal Component

Estimate seasonal component: harmonic regression

Observation \(\{X_t\}\) can be decomposed into a seasonal component \(s_t\) and a zero-mean series \(Y_t\): \[ X_t = s_t + Y_t \]

  • \(s_t\): a periodic function of \(t\) with period \(d\), i.e., \(s_{t-d} = s_t\)

  • Harmonic regression: a sum of harmonics (or sine waves)

\[ s_t = a_0 + \sum_{j=1}^k \left[ a_j \cos\left( \lambda_j t \right) + b_j \sin\left( \lambda_j t \right) \right] \]

  • Unknown (regression) parameters: \(a_j, b_j\)

  • Specified parameters:
    • Number of harmonics: \(k\)
    • Frequencies \(\lambda_j\), each being some integer multiple of \(\frac{2\pi}{d}\)
    • Sometimes \(\lambda_j\) are instead specified through Fourier indices \(f_j = \frac{n \cdot j}{d}\)

Estimate trend and seasonal components

  1. Estimate \(\hat{m}_t\): use a MA filter chosen to elimate the seasonality

    • If \(d\) is odd, let \(d = 2q\)
    • If \(d\) is even, let \(d = 2q\) and \[ \hat{m}_t = (0.5 x_{t-q} + x_{t-q+1} + \cdots + x_{t + q - 1} + 0.5 x_{t+q}) / d \]
  2. Estimate \(\hat{s}_t\): for each \(k = 1, \ldots, d\)

    • Compute the average \(w_k = \textrm{avg}_j (x_{k+jd} - \hat{m}_{k+jd})\)
    • To ensure \(\sum_{k=1}^d s_k = 0\), let \(\hat{s}_k = w_k - \bar{w}\), where \(\bar{w} = \sum_{k = 1}^d w_k / d\)
  3. Re-estimate \(\hat{m}_t\): based on the deseasonalized data \[ d_t = x_t - \hat{s}_t \]

Eliminate trend and seasonal components: differencing

  • Lag-\(d\) differencing \[ \nabla_d X_t = X_t - X_{t-d} = (1 - B^d) X_t \]

    • Note: the operators \(\nabla_d\) and \(\nabla^d = (1-B)^d\) are different
  • Apply \(\nabla_d\) to \(X_t = m_t + s_t + Y_t\) \[ \nabla_d X_t = m_t - m_{t-d} + Y_t - Y_{t-d} \]

    • Then the trend \(m_t - m_{t-d}\) can be eliminated using methods discussed before, e.g., applying a power of the operator \(\nabla\)

Test Whether Estimated Noises are IID

Test series \(\{Y_1, \ldots, Y_n\}\) for iid: sample ACF based

Test name Test statistic Distribution under \(H_0\)
Sample ACF \(\hat{\rho}(h)\), for all \(h\in \mathbb{N}\) \(\textrm{N}(0, 1/n)\)
Portmanteau \(Q = n \sum_{j=1}^h \hat{\rho}^2(j)\) \(\chi^2(h)\)
  • Under \(H_0\), about 95% of the sample ACFs should fall between \(\pm 1.96\sqrt{n}\)

  • The Portmanteau test has some refinements
    • Ljung and Box \(Q_{LB} = n(n+2) \sum_j \hat{\rho}^2(j) / (n-j)\)
    • McLeod and Li \(Q_{ML} = n(n+2) \sum_j \hat{\rho}_{WW}^2(j) / (n-j)\), where \(\hat{\rho}_{WW}^2(h)\) is the sample ACF of squared data

Test series \(\{Y_1, \ldots, Y_n\}\) for iid: other methods

  • Fitting an AR model
    • Using Yule-Walker algorithm and choose order using AICC statistic
    • If the selected order is zero, then the series is white noise
  • Normal qq plot: check of normality

  • A general strategy is to check all above mentioned tests, and proceed with caution if any of them suggests not iid

References

  • Brockwell, Peter J. and Davis, Richard A. (2016), Introduction to Time Series and Forecasting, Third Edition. New York: Springer