Time Series Analysis
Stationarity
Stationary Process
A stationary process is a stochastic process whose unconditional probability distribution does not change when shifted in time.
The unconditional is important here. Consider a non-stationary stochastic process ${X’_t}$ where
\[X'_t = \mu + X_{t-1} + \epsilon_t\]It’s conditional distribution remained unchanged across time:
\[X_t | X_{t-1} \sim \mathcal{N}(\mu + X_{t-1}, \sigma^2)\]However, it’s unconditional distribution changes over time:
\[X_t \sim \mathcal{N}(t\mu, \sigma^2)\]Stabilizing Time Series
- Differencing
- Producing a new time series by computing the differences between consecutive observations
- Help to stabilize the mean of a time series by eliminating trends and seasonality
- Logarithm
- Producing a new time series by computing the log over ratio between consecutive observations
- Requires no sign change
- Help to stabilize the variance of a time series
Tests for Stationarity
Tests for stationarity are generally aimed at detecting evidence of non-stationarity rather than conclusively proving stationarity.
Stationarity is a theoretical property which can be very challenging to verify with absolute certainty.
In empirical data, the presence of noise, finite samples, and model approximations can obscure the underlying process characteristics.
Stationarity Tests generally check for the presence of a unit root, which would imply non-stationarity. The alternative hypothesis suggests stationarity.
Characteristic Equation
Consider a general linear difference equation that defines a linear stochastic process:
\[X_t = a_1 X_{t−1} + ... + a_p X_{t−p} + \epsilon_t\]Reorganize the equation to isolate the error term:
\[X_t - a_1 X_{t−1} - ... - a_p X_{t−p} = \epsilon_t\]The characteristic equation is:
\[m^p - a_1 m^{p-1} - ... - a_{p-1} m - a_p = 0\]Root $z$
- $m<1$
- Stable components of the process
- Impacts from shocks or innovations $\epsilon_t$ related to these components tend to decay geometrically over time
- These parts of the process do not contribute to non-stationarity
- $m=1$
- Boundary condition between stability and instability
- A root of exactly 1 means that the impact of a shock does not decay over time
- The effects are integrated into the level of the process and persist indefinitely
- Time series has a type of non-stationarity called: random walk or difference-stationary process
- $m>1$
- Explosive components
- Impact of shocks grows over time, leading to increasing variance and, consequently, non-stationarity
Example
Consider AR(1) model:
\[X_t = \phi X_{t-1} + \epsilon_t = X_0 + \sum_{i=1}^t \phi^{t-i} \epsilon_i\]The characteristic equation is:
\[m - \phi = 0\]With root $m = \phi$
If $\phi = 1$, $m=1$, a unit root is present. The corresponding $AR(1)$ process is:
\[X_t = X_0 + \sum_{i=1}^t \epsilon_i\]Indicating that the impact of any $\epsilon_i$ persists indefinitely into future.
If $\phi > 1$, $m>1$, impact of any $\epsilon_i$ grows over time, leading to non-stationarity.
If $\phi < 1$, $m<1$, impact of any $\epsilon_i$ reduces over time, leading to stationarity.
Stationarity Test
Dickey–Fuller Test
Consider the AR(1) model:
\[X_t = \phi X_{t-1} + \epsilon_t\]Our goal is to test for unit root $\phi = 1$, and we rewrite the model into
\[\Delta X_t = X_t - X_{t-1} = (\phi-1)X_{t-1} + \epsilon_t = \gamma X_{t-1} + \epsilon_t\]Testing for unit root is equivalent to testing $\gamma = 0$.
- Null hypothesis: a unit root is present in an AR model
- Alternative hypothesis: time-series is stationarity or trend-stationarity
The testing statistic / critial value comes from Dickey-Fuller distribution, rather than t-distribution.
Augmented Dickey–Fuller (ADF) Test
An augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models
\[\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \delta_1 \Delta X_{t-1} + ... + \delta_{p-1} \Delta X_{t-p+1} + \epsilon_t\]Testing for unit root
- Null hypothesis: $\gamma = 0$
- Alternative hypothesis $\gamma < 0$
Phillips-Perron (PP) Test
[TBD]
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test
[TBD]
Autocorrelation
The correlation of a signal $X_t$ and a lagged version of itself $X_{t-k}$
Unit root processes, trend-stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.
Estimation lag-$\tau$ autocorrelation for a discrete process:
\[{\hat {R}}(\tau)={\frac {1}{(n-\tau)\sigma ^{2}}}\sum _{t=1}^{n-\tau}(X_{t}-\mu )(X_{t+\tau}-\mu )\]ACF and PACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.
Autocorrelation Function (ACF)
ACF is a tool / function used to represent autocorrelation across different lags in a structured way, essentially a way to visualize the autocorrelation for every possible lag in the data series.
ACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.
\[\rho_k = \text{Corr}(X_t, X_{t-k})\]Autocorrelation function starts a lag 0, which is the correlation of the time series with itself and therefore results in a correlation of 1.
The ACF plot can provide answers to the following questions:
- Is the observed time series white noise / random?
- Can the observed time series be modeled with an MA model? If yes, what is the order?
Partial Autocorrelation Function (PACF)
PACF is a correlation of $X_t$ and a lagged version of itself $X_{t-k}$, but with interceding correlations $X_{t-1}, X_{t-2}, …, X_{t-k+1}$ removed (hence partial).
PACF assume stationarity, which can be checked by Augmented Dickey-Fuller (ADF) test.
\[\phi_k = \text{Corr}(X_t - \hat{X}_t, X_{t-k} - \hat{X}_{t-k})\]where \(\hat{X}_t\) is an estimation / regression from $X_{t-1}, …, X_{t-k}$:
- \(\hat{X}_t\) is a regression on $X$ that exclude $X_{t-k}$.
- $\hat{X}_{t-k}$ is a regression on $X$ that exclude $X_t$.
PACF is often computed with specialized algorithms like the Durbin-Levinson or the Yule-Walker equations, instead of regression.
ACF/PACF Behavior
ACF of an MA(q) model
- ACF have significant correlations up to lag $q$, due to $X_t$ is a linear combination of the $q$ most recent error terms $\epsilon$
- After $q$-lag the ACF coefficients should theoretically drop to zero
- Hence, ACF plot can be used to identify the order $q$ of the MA model
PACF of an AR(p) model
- PACF displays a sharp cutoff after lag $p$, due to no significant dependency beyond $p$ most recent $X$
- PACF effectively isolates the direct relationship between an observation and its lag at $p$, without the confounding influence of the intermediate lags
- Hence, PACF plot can be used to identify the order $p$ of the AR model
Behavior of ACF and PACF on different ground truth models:
AR(p) | MA(q) | ARMA(p, q) | |
ACF | Geometric decay | Significant at lag q / 0 beyond lag q | Geometric decay |
PACF | Significant at each lag p / 0 beyond lag p | Geometric decay | Geometric decay |
ARMA Model
AR Model
\[\hat{X}_t = \alpha_1 X_{t-1} + ... + \alpha_p X_{t-p} + \epsilon_t\]AR model assumes that $y_t$ depends on previous values $(y_{t-1}, y_{t-2}, y_{t-3}, …)$
PACF can be used to determine the order $p$ of an AR(p) model.
MA Model
\[\hat{y}_t = \epsilon_t + \beta_1 \epsilon_{t-1} + ... + \beta_q \epsilon_{t-q}\]MA model assumes that $y_t$ depends on the error term $(\epsilon_t, \epsilon_{t-1}, \epsilon_{t-2}, \epsilon_{t-3}, …)$, indicating how much past surprises (unexpected deviations from the model’s predictions) are factored into the current prediction.
ACF can be used to determine the order $q$ of an MA(q) model.
ARMA Model
\[X_t = \sum_{i=1}^p \alpha_i X_{t-1} + \sum_{i=1}^q \beta_i \epsilon_{t-i} + \epsilon_t\]Extended autocorrelation functions (EACF) can be used to simultaneously determine $p$ and $q$.