Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic

Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications...

Full description

Bibliographic Details
Main Author: Li, Michael Lingzhi
Other Authors: Bertsimas, Dimitris J.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143205
https://orcid.org/ 0000-0002-2456-4834
_version_ 1811095471857336320
author Li, Michael Lingzhi
author2 Bertsimas, Dimitris J.
author_facet Bertsimas, Dimitris J.
Li, Michael Lingzhi
author_sort Li, Michael Lingzhi
collection MIT
description Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications to apply our insights to the world. In this thesis, we demonstrate both sides of the coin. In the first part of the thesis, we focus on building scalable algorithms for large-scale data analytics. In Chapter 1, we consider a novel reformulation of the matrix completion problem and developed a projected stochastic gradient descent method, fastImpute, to solve matrix completion 20x faster than state-of-the-art methods while providing optimality guarantees. In Chapter 2, we introduce the Interpretable Matrix Completion problem (IMC) to provide meaningful insights for low-rank matrices using side information. We designed an algorithm, OptComplete, based on the novel concept of stochastic cutting planes that enables us to solve extremely large instances. In Chapter 3, we extend OptComplete to general data-driven mixed-integer optimization problems including sparse regression, support vector machines, and the knapsack problem. We show that the algorithm is able to match or exceed state-of-the-art results. The second part of the thesis revolves around applying large-scale data analytics to the COVID-19 pandemic. In Chapter 4, we introduce a novel policy-driven epidemiological model, DELPHI. We show that DELPHI compares favorably with other top epidemiology models and predicted the large-scale epidemics in US, UK and Russia months before. We demonstrate how the explicit modeling of governmental interventions in DELPHI enabled its use for planning the trial of the Janssen Ad26.Cov2.S vaccine. In Chapter 5, we apply DELPHI to determine COVID-19 mass vaccination centers in the US. We developed an optimization model to allocate the limited vaccine supply and minimize future pandemic deaths, while fully incorporating the nonlinear DELPHI dynamics. We proposed a coordinate descent model to solve the problem at scale, and showed how optimized vaccine allocation can save 20% more individuals while still ensuring equity. Our conclusions directly affected how FEMA allocated its vaccines, increasing its focus on states such as Texas and Florida.
first_indexed 2024-09-23T16:17:17Z
format Thesis
id mit-1721.1/143205
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T16:17:17Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1432052022-06-16T03:03:39Z Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic Li, Michael Lingzhi Bertsimas, Dimitris J. Massachusetts Institute of Technology. Operations Research Center Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications to apply our insights to the world. In this thesis, we demonstrate both sides of the coin. In the first part of the thesis, we focus on building scalable algorithms for large-scale data analytics. In Chapter 1, we consider a novel reformulation of the matrix completion problem and developed a projected stochastic gradient descent method, fastImpute, to solve matrix completion 20x faster than state-of-the-art methods while providing optimality guarantees. In Chapter 2, we introduce the Interpretable Matrix Completion problem (IMC) to provide meaningful insights for low-rank matrices using side information. We designed an algorithm, OptComplete, based on the novel concept of stochastic cutting planes that enables us to solve extremely large instances. In Chapter 3, we extend OptComplete to general data-driven mixed-integer optimization problems including sparse regression, support vector machines, and the knapsack problem. We show that the algorithm is able to match or exceed state-of-the-art results. The second part of the thesis revolves around applying large-scale data analytics to the COVID-19 pandemic. In Chapter 4, we introduce a novel policy-driven epidemiological model, DELPHI. We show that DELPHI compares favorably with other top epidemiology models and predicted the large-scale epidemics in US, UK and Russia months before. We demonstrate how the explicit modeling of governmental interventions in DELPHI enabled its use for planning the trial of the Janssen Ad26.Cov2.S vaccine. In Chapter 5, we apply DELPHI to determine COVID-19 mass vaccination centers in the US. We developed an optimization model to allocate the limited vaccine supply and minimize future pandemic deaths, while fully incorporating the nonlinear DELPHI dynamics. We proposed a coordinate descent model to solve the problem at scale, and showed how optimized vaccine allocation can save 20% more individuals while still ensuring equity. Our conclusions directly affected how FEMA allocated its vaccines, increasing its focus on states such as Texas and Florida. Ph.D. 2022-06-15T13:03:23Z 2022-06-15T13:03:23Z 2022-02 2022-01-06T00:03:26.454Z Thesis https://hdl.handle.net/1721.1/143205 https://orcid.org/ 0000-0002-2456-4834 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Li, Michael Lingzhi
Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title_full Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title_fullStr Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title_full_unstemmed Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title_short Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
title_sort algorithms for large scale data analytics and applications to the covid 19 pandemic
url https://hdl.handle.net/1721.1/143205
https://orcid.org/ 0000-0002-2456-4834
work_keys_str_mv AT limichaellingzhi algorithmsforlargescaledataanalyticsandapplicationstothecovid19pandemic