Overcoming Data Scarcity in Deep Learning of Scientific Problems

Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeas...

Full description

Bibliographic Details
Main Author: Loh, Charlotte Chang Le
Other Authors: Soljačić, Marin
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/140165
_version_ 1826193882978713600
author Loh, Charlotte Chang Le
author2 Soljačić, Marin
author_facet Soljačić, Marin
Loh, Charlotte Chang Le
author_sort Loh, Charlotte Chang Le
collection MIT
description Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems.
first_indexed 2024-09-23T09:46:48Z
format Thesis
id mit-1721.1/140165
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:46:48Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1401652022-02-08T03:55:07Z Overcoming Data Scarcity in Deep Learning of Scientific Problems Loh, Charlotte Chang Le Soljačić, Marin Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems. S.M. 2022-02-07T15:28:01Z 2022-02-07T15:28:01Z 2021-09 2021-09-21T19:54:12.271Z Thesis https://hdl.handle.net/1721.1/140165 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Loh, Charlotte Chang Le
Overcoming Data Scarcity in Deep Learning of Scientific Problems
title Overcoming Data Scarcity in Deep Learning of Scientific Problems
title_full Overcoming Data Scarcity in Deep Learning of Scientific Problems
title_fullStr Overcoming Data Scarcity in Deep Learning of Scientific Problems
title_full_unstemmed Overcoming Data Scarcity in Deep Learning of Scientific Problems
title_short Overcoming Data Scarcity in Deep Learning of Scientific Problems
title_sort overcoming data scarcity in deep learning of scientific problems
url https://hdl.handle.net/1721.1/140165
work_keys_str_mv AT lohcharlottechangle overcomingdatascarcityindeeplearningofscientificproblems