Stationarity

Stationary Process

A stationary process is a stochastic process whose unconditional probability distribution does not change when shifted in time.

The unconditional is important here. Consider a non-stationary stochastic process ${X’_t}$ where

\[X'_t = \mu + X_{t-1} + \epsilon_t\]

It’s conditional distribution remained unchanged across time:

\[X_t | X_{t-1} \sim \mathcal{N}(\mu + X_{t-1}, \sigma^2)\]

However, it’s unconditional distribution changes over time:

\[X_t \sim \mathcal{N}(t\mu, \sigma^2)\]

Stabilizing Time Series

  • Differencing
    • Producing a new time series by computing the differences between consecutive observations
    • Help to stabilize the mean of a time series by eliminating trends and seasonality
  • Logarithm
    • Producing a new time series by computing the log over ratio between consecutive observations
    • Requires no sign change
    • Help to stabilize the variance of a time series

Tests for Stationarity

Tests for stationarity are generally aimed at detecting evidence of non-stationarity rather than conclusively proving stationarity.

Stationarity is a theoretical property which can be very challenging to verify with absolute certainty.

In empirical data, the presence of noise, finite samples, and model approximations can obscure the underlying process characteristics.

Stationarity Tests generally check for the presence of a unit root, which would imply non-stationarity. The alternative hypothesis suggests stationarity.

Characteristic Equation

Consider a general linear difference equation that defines a linear stochastic process:

\[X_t = a_1 X_{t−1} + ... + a_p X_{t−p} + \epsilon_t\]

Reorganize the equation to isolate the error term:

\[X_t - a_1 X_{t−1} - ... - a_p X_{t−p} = \epsilon_t\]

The characteristic equation is:

\[m^p - a_1 m^{p-1} - ... - a_{p-1} m - a_p = 0\]

Root $z$

  • $m<1$
    • Stable components of the process
    • Impacts from shocks or innovations $\epsilon_t$ related to these components tend to decay geometrically over time
    • These parts of the process do not contribute to non-stationarity
  • $m=1$
    • Boundary condition between stability and instability
    • A root of exactly 1 means that the impact of a shock does not decay over time
    • The effects are integrated into the level of the process and persist indefinitely
    • Time series has a type of non-stationarity called: random walk or difference-stationary process
  • $m>1$
    • Explosive components
    • Impact of shocks grows over time, leading to increasing variance and, consequently, non-stationarity

Example

Consider AR(1) model:

\[X_t = \phi X_{t-1} + \epsilon_t = X_0 + \sum_{i=1}^t \phi^{t-i} \epsilon_i\]

The characteristic equation is:

\[m - \phi = 0\]

With root $m = \phi$

If $\phi = 1$, $m=1$, a unit root is present. The corresponding $AR(1)$ process is:

\[X_t = X_0 + \sum_{i=1}^t \epsilon_i\]

Indicating that the impact of any $\epsilon_i$ persists indefinitely into future.

If $\phi > 1$, $m>1$, impact of any $\epsilon_i$ grows over time, leading to non-stationarity.

If $\phi < 1$, $m<1$, impact of any $\epsilon_i$ reduces over time, leading to stationarity.

Stationarity Test

Dickey–Fuller Test

Consider the AR(1) model:

\[X_t = \phi X_{t-1} + \epsilon_t\]

Our goal is to test for unit root $\phi = 1$, and we rewrite the model into

\[\Delta X_t = X_t - X_{t-1} = (\phi-1)X_{t-1} + \epsilon_t = \gamma X_{t-1} + \epsilon_t\]

Testing for unit root is equivalent to testing $\gamma = 0$.

  • Null hypothesis: a unit root is present in an AR model
  • Alternative hypothesis: time-series is stationarity or trend-stationarity

The testing statistic / critial value comes from Dickey-Fuller distribution, rather than t-distribution.

Augmented Dickey–Fuller (ADF) Test

An augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models

\[\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \delta_1 \Delta X_{t-1} + ... + \delta_{p-1} \Delta X_{t-p+1} + \epsilon_t\]

Testing for unit root

  • Null hypothesis: $\gamma = 0$
  • Alternative hypothesis $\gamma < 0$

Phillips-Perron (PP) Test

[TBD]

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

[TBD]

Autocorrelation

The correlation of a signal $X_t$ and a lagged version of itself $X_{t-k}$

Unit root processes, trend-stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.

Estimation lag-$\tau$ autocorrelation for a discrete process:

\[{\hat {R}}(\tau)={\frac {1}{(n-\tau)\sigma ^{2}}}\sum _{t=1}^{n-\tau}(X_{t}-\mu )(X_{t+\tau}-\mu )\]

ACF and PACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.

Autocorrelation Function (ACF)

ACF is a tool / function used to represent autocorrelation across different lags in a structured way, essentially a way to visualize the autocorrelation for every possible lag in the data series.

ACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.

\[\rho_k = \text{Corr}(X_t, X_{t-k})\]

Autocorrelation function starts a lag 0, which is the correlation of the time series with itself and therefore results in a correlation of 1.

The ACF plot can provide answers to the following questions:

  • Is the observed time series white noise / random?
  • Can the observed time series be modeled with an MA model? If yes, what is the order?

Partial Autocorrelation Function (PACF)

PACF is a correlation of $X_t$ and a lagged version of itself $X_{t-k}$, but with interceding correlations $X_{t-1}, X_{t-2}, …, X_{t-k+1}$ removed (hence partial).

PACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.

\[\phi_k = \text{Corr}(X_t - \hat{X}_t, X_{t-k} - \hat{X}_{t-k})\]

where \(\hat{X}_t\) is an estimation / regression from $X_{t-1}, …, X_{t-k}$:

  • \(\hat{X}_t\) is a regression on $X$ that exclude $X_{t-k}$.
  • $\hat{X}_{t-k}$ is a regression on $X$ that exclude $X_t$.

PACF is often computed with specialized algorithms like the Durbin-Levinson or the Yule-Walker equations, instead of regression.

ACF/PACF Behavior

ACF of an MA(q) model

  • ACF have significant correlations up to lag $q$, due to $X_t$ is a linear combination of the $q$ most recent error terms $\epsilon$
  • After $q$-lag the ACF coefficients should theoretically drop to zero
  • Hence, ACF plot can be used to identify the order $q$ of the MA model

PACF of an AR(p) model

  • PACF displays a sharp cutoff after lag $p$, due to no significant dependency beyond $p$ most recent $X$
  • PACF effectively isolates the direct relationship between an observation and its lag at $p$, without the confounding influence of the intermediate lags
  • Hence, PACF plot can be used to identify the order $p$ of the AR model

Behavior of ACF and PACF on different ground truth models:

  AR(p) MA(q) ARMA(p, q)
ACF Geometric decay Significant at lag q / 0 beyond lag q Geometric decay
PACF Significant at each lag p / 0 beyond lag p Geometric decay Geometric decay

ARMA Model

AR Model

\[\hat{X}_t = \alpha_1 X_{t-1} + ... + \alpha_p X_{t-p} + \epsilon_t\]

AR model assumes that $y_t$ depends on previous values $(y_{t-1}, y_{t-2}, y_{t-3}, …)$

PACF can be used to determine the order $p$ of an AR(p) model.

MA Model

\[\hat{y}_t = \epsilon_t + \beta_1 \epsilon_{t-1} + ... + \beta_q \epsilon_{t-q}\]

MA model assumes that $y_t$ depends on the error term $(\epsilon_t, \epsilon_{t-1}, \epsilon_{t-2}, \epsilon_{t-3}, …)$, indicating how much past surprises (unexpected deviations from the model’s predictions) are factored into the current prediction.

ACF can be used to determine the order $q$ of an MA(q) model.

ARMA Model

\[X_t = \sum_{i=1}^p \alpha_i X_{t-1} + \sum_{i=1}^q \beta_i \epsilon_{t-i} + \epsilon_t\]

Extended autocorrelation functions (EACF) can be used to simultaneously determine $p$ and $q$.