Analytics under Variability, Volume, and Velocity with Applications to Sustainability and Healthcare
Analytics, machine learning, and optimization provide unique opportunities to harness the massive amounts of data that are available and positively impact some of the most pressing challenges of our time, including climate change and improved healthcare operations. The classical paradigm of analytic...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151352 https://orcid.org/0000-0001-6770-7543 |
Summary: | Analytics, machine learning, and optimization provide unique opportunities to harness the massive amounts of data that are available and positively impact some of the most pressing challenges of our time, including climate change and improved healthcare operations. The classical paradigm of analytics, which assumes a dataset is centrally collected and readily available to analyze, is shifting. Modern data science problems present new complexities, including variability (i.e., changing phenomena due to various types of uncertainties), large volumes of data or decisions or both, and data arriving dynamically with high velocity.
This thesis advances two strands of large-scale analytics. The first is methodological, focusing on the development of predictive and prescriptive machine learning and optimization methodologies, primarily mixed-integer and robust, for problems that exhibit the aforementioned characteristics. The second is applied, and encompasses collaborations with various industry partners in the sustainability and healthcare operations spaces, seeking to reap the benefits of large-scale analytics in these settings.
In Chapters 2 and 3, we introduce the framework of slowly varying machine learning, which provides a tool to deal with variability in an interpretable way. In Chapter 2 in particular, our methodology enables the estimation of sparse linear regression models where the underlying regression coefficients are allowed to vary slowly and sparsely under some graph-based temporal or spatial structure. In Chapter 3, we take a step toward the stabilization of decision tree models even under new trends in the training data. In Chapter 4, we introduce the backbone method, a general, heuristic framework that scales interpretable machine learning techniques to ultra-high dimensional datasets hence tackling the volume characteristic. Chapter 5 develops a mixed integer optimization- and machine learning-based approach for the problem of frequency estimation in data streams, addressing settings where large amounts of data arrive dynamically with high velocity. Finally, in Chapter 6, we present a robust optimization- and machine learning-based framework that guides a 1 billion USD investment in solar panels and batteries by a leading fertilizer producer, with the aim of decarbonizing a significant portion of their production pipeline and reducing operational costs. Our model’s forecast indicates that this decarbonization effort will be profitable, thus emphasizing that investing in renewable energy can be a financially viable option, rather than an expensive luxury that developing nations cannot afford while industrializing their economies. |
---|