Improving supervised machine learning for materials science

Despite the widespread applications of machine learning models in materials science, in many cases the performance of machine learning models is not sufficiently accurate enough to meet the needs of materials design. In this thesis, we propose and apply a series of strategies to exam and improve upo...

Full description

Bibliographic Details
Main Author: Gong, Sheng
Other Authors: Grossman, Jeffrey C.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/147240
Description
Summary:Despite the widespread applications of machine learning models in materials science, in many cases the performance of machine learning models is not sufficiently accurate enough to meet the needs of materials design. In this thesis, we propose and apply a series of strategies to exam and improve upon the performance of machine learning models for specific materials problems. First, we exam whether current deep representation learning models for atomistic systems can capture human knowledge of crystal structures, and find that current graph neural networks can capture knowledge of local atomic environments but cannot capture periodicity of crystal structures. As an initial solution, we propose to hybridize human knowledge with deep representation learning models, and find that the hybridization can lead to large improvement for predicting vibrational properties of materials. Then, for situations where the datasets of target materials properties are small while there are large relevant materials datasets, we propose to use transfer learning and multi-fidelity learning to transfer information between the large and small datasets to facilitate the learning of target properties. We use experimentally measured formation enthalpy and lattice thermal conductivity as case studies to exam the usefulness of information transfer and understand where and why information transfer helps. For situations where expansion of datasets is necessary, we propose to use active learning/Bayesian Optimization to sample the materials space efficiently and mitigate bias, and as a case study, we apply Bayesian Optimization to find the optimal laser processing parameters for poly(acrylonitrile) sheet as porous carbon electrode. Finally, if generation of data is time-consuming, we propose to use machine learning to accelerate materials experiments and simulations. For this goal, we develop a framework to use graph neural networks to predict charge density distribution of materials. The machine learning models developed in this thesis not only deepen human understanding of where and how machine learning can be used to facilitate materials development, but also lead to the discovery of new materials systems, new processes, and new insights, such as new candidate thermoelectric materials, new processes for lasering poly(acrylonitrile), and new insights into the evaluation of the stability of materials.