Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic

Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications...

Full description

Bibliographic Details
Main Author: Li, Michael Lingzhi
Other Authors: Bertsimas, Dimitris J.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143205
https://orcid.org/ 0000-0002-2456-4834
Description
Summary:Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications to apply our insights to the world. In this thesis, we demonstrate both sides of the coin. In the first part of the thesis, we focus on building scalable algorithms for large-scale data analytics. In Chapter 1, we consider a novel reformulation of the matrix completion problem and developed a projected stochastic gradient descent method, fastImpute, to solve matrix completion 20x faster than state-of-the-art methods while providing optimality guarantees. In Chapter 2, we introduce the Interpretable Matrix Completion problem (IMC) to provide meaningful insights for low-rank matrices using side information. We designed an algorithm, OptComplete, based on the novel concept of stochastic cutting planes that enables us to solve extremely large instances. In Chapter 3, we extend OptComplete to general data-driven mixed-integer optimization problems including sparse regression, support vector machines, and the knapsack problem. We show that the algorithm is able to match or exceed state-of-the-art results. The second part of the thesis revolves around applying large-scale data analytics to the COVID-19 pandemic. In Chapter 4, we introduce a novel policy-driven epidemiological model, DELPHI. We show that DELPHI compares favorably with other top epidemiology models and predicted the large-scale epidemics in US, UK and Russia months before. We demonstrate how the explicit modeling of governmental interventions in DELPHI enabled its use for planning the trial of the Janssen Ad26.Cov2.S vaccine. In Chapter 5, we apply DELPHI to determine COVID-19 mass vaccination centers in the US. We developed an optimization model to allocate the limited vaccine supply and minimize future pandemic deaths, while fully incorporating the nonlinear DELPHI dynamics. We proposed a coordinate descent model to solve the problem at scale, and showed how optimized vaccine allocation can save 20% more individuals while still ensuring equity. Our conclusions directly affected how FEMA allocated its vaccines, increasing its focus on states such as Texas and Florida.