Multivariate Singular Spectrum Analysis: A Principled, Practical, and Performant Solution for Time Series Imputation and Forecasting

The analysis of multivariate time series data is of great interest across many domains, including cyber-physical systems, finance, retail, healthcare to name a few. A common goal across all of these domains is accurate imputation and forecasting of multivariate time series in the presence of noisy a...

Full description

Bibliographic Details
Main Author: Alomar, Abdullah
Other Authors: Shah, Devavrat
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/140365
Description
Summary:The analysis of multivariate time series data is of great interest across many domains, including cyber-physical systems, finance, retail, healthcare to name a few. A common goal across all of these domains is accurate imputation and forecasting of multivariate time series in the presence of noisy and/or missing data. Given the growing need to embed predictive functionality in high-performance systems, especially in applications with time series data (e.g., financial systems, control systems), it is increasingly vital that we build principled prediction algorithms that are statistically and computationally performant, and more broadly accessible. To that end, we introduce a novel variant of multivariate Singular Spectrum Analysis (mSSA) that allows for accurate imputation and forecasting of both time-varying mean and variance of multivariate time series. We further justify this algorithm by introducing a natural Spatio-temporal factor model, under which the algorithm is theoretically analyzed; Specifically, We establish the in-sample prediction error of our mSSA variant for both imputation and forecasting. Further, we propose an incremental variant of the algorithm, upon which, a real-time prediction system for time series data, tspDB, is instantiated and evaluated. tspDB aims to increase accessibility to predictive functionalities for time series data through the direct integration with existing relational time series Databases. Finally, through rigorous experiments, we show that tspDB provides state-of-the-art statistical accuracy while maintaining a superior computational performance with an incremental model update, low model training time, and low latency for prediction queries.