Overcoming Data Scarcity in Deep Learning of Scientific Problems
Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeas...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/140165 |
_version_ | 1826193882978713600 |
---|---|
author | Loh, Charlotte Chang Le |
author2 | Soljačić, Marin |
author_facet | Soljačić, Marin Loh, Charlotte Chang Le |
author_sort | Loh, Charlotte Chang Le |
collection | MIT |
description | Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems. |
first_indexed | 2024-09-23T09:46:48Z |
format | Thesis |
id | mit-1721.1/140165 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:46:48Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1401652022-02-08T03:55:07Z Overcoming Data Scarcity in Deep Learning of Scientific Problems Loh, Charlotte Chang Le Soljačić, Marin Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems. S.M. 2022-02-07T15:28:01Z 2022-02-07T15:28:01Z 2021-09 2021-09-21T19:54:12.271Z Thesis https://hdl.handle.net/1721.1/140165 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Loh, Charlotte Chang Le Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title | Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title_full | Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title_fullStr | Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title_full_unstemmed | Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title_short | Overcoming Data Scarcity in Deep Learning of Scientific Problems |
title_sort | overcoming data scarcity in deep learning of scientific problems |
url | https://hdl.handle.net/1721.1/140165 |
work_keys_str_mv | AT lohcharlottechangle overcomingdatascarcityindeeplearningofscientificproblems |