Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic
Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/143205 https://orcid.org/ 0000-0002-2456-4834 |
_version_ | 1811095471857336320 |
---|---|
author | Li, Michael Lingzhi |
author2 | Bertsimas, Dimitris J. |
author_facet | Bertsimas, Dimitris J. Li, Michael Lingzhi |
author_sort | Li, Michael Lingzhi |
collection | MIT |
description | Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications to apply our insights to the world. In this thesis, we demonstrate both sides of the coin.
In the first part of the thesis, we focus on building scalable algorithms for large-scale data analytics. In Chapter 1, we consider a novel reformulation of the matrix completion problem and developed a projected stochastic gradient descent method, fastImpute, to solve matrix completion 20x faster than state-of-the-art methods while providing optimality guarantees. In Chapter 2, we introduce the Interpretable Matrix Completion problem (IMC) to provide meaningful insights for low-rank matrices using side information. We designed an algorithm, OptComplete, based on the novel concept of stochastic cutting planes that enables us to solve extremely large instances. In Chapter 3, we extend OptComplete to general data-driven mixed-integer optimization problems including sparse regression, support vector machines, and the knapsack problem. We show that the algorithm is able to match or exceed state-of-the-art results.
The second part of the thesis revolves around applying large-scale data analytics to the COVID-19 pandemic. In Chapter 4, we introduce a novel policy-driven epidemiological model, DELPHI. We show that DELPHI compares favorably with other top epidemiology models and predicted the large-scale epidemics in US, UK and Russia months before. We demonstrate how the explicit modeling of governmental interventions in DELPHI enabled its use for planning the trial of the Janssen Ad26.Cov2.S vaccine. In Chapter 5, we apply DELPHI to determine COVID-19 mass vaccination centers in the US. We developed an optimization model to allocate the limited vaccine supply and minimize future pandemic deaths, while fully incorporating the nonlinear DELPHI dynamics. We proposed a coordinate descent model to solve the problem at scale, and showed how optimized vaccine allocation can save 20% more individuals while still ensuring equity. Our conclusions directly affected how FEMA allocated its vaccines, increasing its focus on states such as Texas and Florida. |
first_indexed | 2024-09-23T16:17:17Z |
format | Thesis |
id | mit-1721.1/143205 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T16:17:17Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1432052022-06-16T03:03:39Z Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic Li, Michael Lingzhi Bertsimas, Dimitris J. Massachusetts Institute of Technology. Operations Research Center Operations Research (OR) can be defined as using advanced quantitative tools to make better decisions and create impact. In the modern world, to generate impact, we need both scalable algorithms that allow us to extract insights from an ever-increasing amount of data, and also important applications to apply our insights to the world. In this thesis, we demonstrate both sides of the coin. In the first part of the thesis, we focus on building scalable algorithms for large-scale data analytics. In Chapter 1, we consider a novel reformulation of the matrix completion problem and developed a projected stochastic gradient descent method, fastImpute, to solve matrix completion 20x faster than state-of-the-art methods while providing optimality guarantees. In Chapter 2, we introduce the Interpretable Matrix Completion problem (IMC) to provide meaningful insights for low-rank matrices using side information. We designed an algorithm, OptComplete, based on the novel concept of stochastic cutting planes that enables us to solve extremely large instances. In Chapter 3, we extend OptComplete to general data-driven mixed-integer optimization problems including sparse regression, support vector machines, and the knapsack problem. We show that the algorithm is able to match or exceed state-of-the-art results. The second part of the thesis revolves around applying large-scale data analytics to the COVID-19 pandemic. In Chapter 4, we introduce a novel policy-driven epidemiological model, DELPHI. We show that DELPHI compares favorably with other top epidemiology models and predicted the large-scale epidemics in US, UK and Russia months before. We demonstrate how the explicit modeling of governmental interventions in DELPHI enabled its use for planning the trial of the Janssen Ad26.Cov2.S vaccine. In Chapter 5, we apply DELPHI to determine COVID-19 mass vaccination centers in the US. We developed an optimization model to allocate the limited vaccine supply and minimize future pandemic deaths, while fully incorporating the nonlinear DELPHI dynamics. We proposed a coordinate descent model to solve the problem at scale, and showed how optimized vaccine allocation can save 20% more individuals while still ensuring equity. Our conclusions directly affected how FEMA allocated its vaccines, increasing its focus on states such as Texas and Florida. Ph.D. 2022-06-15T13:03:23Z 2022-06-15T13:03:23Z 2022-02 2022-01-06T00:03:26.454Z Thesis https://hdl.handle.net/1721.1/143205 https://orcid.org/ 0000-0002-2456-4834 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Li, Michael Lingzhi Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title | Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title_full | Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title_fullStr | Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title_full_unstemmed | Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title_short | Algorithms for Large-scale Data Analytics and Applications to the COVID-19 Pandemic |
title_sort | algorithms for large scale data analytics and applications to the covid 19 pandemic |
url | https://hdl.handle.net/1721.1/143205 https://orcid.org/ 0000-0002-2456-4834 |
work_keys_str_mv | AT limichaellingzhi algorithmsforlargescaledataanalyticsandapplicationstothecovid19pandemic |