Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap
Materials discovery and design typically proceeds through iterative evaluation (both experimental and computational) to obtain data, generally targeting improvement of one or more properties under one or more constraints (e.g. time or budget). However, there can be great variation in the quality and...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2023-01-01
|
Series: | Machine Learning: Science and Technology |
Subjects: | |
Online Access: | https://doi.org/10.1088/2632-2153/ad1627 |
_version_ | 1827397768041201664 |
---|---|
author | Ryan Jacobs Philip E Goins Dane Morgan |
author_facet | Ryan Jacobs Philip E Goins Dane Morgan |
author_sort | Ryan Jacobs |
collection | DOAJ |
description | Materials discovery and design typically proceeds through iterative evaluation (both experimental and computational) to obtain data, generally targeting improvement of one or more properties under one or more constraints (e.g. time or budget). However, there can be great variation in the quality and cost of different data, and when they are mixed together in what we here call multifidelity data, the optimal approaches to their utilization are not established. It is therefore important to develop strategies to acquire and use multifidelity data to realize the most efficient iterative materials exploration. In this work, we assess the impact of using multifidelity data through mock demonstration of designing solar cell materials, using the electronic bandgap as the target property. We propose a new approach of using multifidelity data through leveraging machine learning models of both low- and high-fidelity data, where using predicted low-fidelity data as an input feature in the high-fidelity model can improve the impact of a multifidelity data approach. We show how tradeoffs of low- versus high-fidelity measurement cost and acquisition can impact the materials discovery process. We find that the use of multifidelity data has maximal impact on the materials discovery campaign when approximately five low-fidelity measurements per high-fidelity measurement are performed, and when the cost of low-fidelity measurements is approximately 5% or less than that of high-fidelity measurements. This work provides practical guidance and useful qualitative measures for improving materials discovery campaigns that involve multifidelity data. |
first_indexed | 2024-03-08T19:16:22Z |
format | Article |
id | doaj.art-49f12f81b3754681aa713d0f5fa64b63 |
institution | Directory Open Access Journal |
issn | 2632-2153 |
language | English |
last_indexed | 2024-03-08T19:16:22Z |
publishDate | 2023-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | Machine Learning: Science and Technology |
spelling | doaj.art-49f12f81b3754681aa713d0f5fa64b632023-12-27T06:15:37ZengIOP PublishingMachine Learning: Science and Technology2632-21532023-01-014404506010.1088/2632-2153/ad1627Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgapRyan Jacobs0https://orcid.org/0000-0003-2229-6730Philip E Goins1Dane Morgan2Department of Materials Science and Engineering, University of Wisconsin-Madison , Madison, WI 53706, United States of AmericaU.S. C.C.D.C. Army Research Laboratory , 6300 Rodman Road, Aberdeen Proving Ground, Aberdeen, MD 21005, United States of AmericaDepartment of Materials Science and Engineering, University of Wisconsin-Madison , Madison, WI 53706, United States of AmericaMaterials discovery and design typically proceeds through iterative evaluation (both experimental and computational) to obtain data, generally targeting improvement of one or more properties under one or more constraints (e.g. time or budget). However, there can be great variation in the quality and cost of different data, and when they are mixed together in what we here call multifidelity data, the optimal approaches to their utilization are not established. It is therefore important to develop strategies to acquire and use multifidelity data to realize the most efficient iterative materials exploration. In this work, we assess the impact of using multifidelity data through mock demonstration of designing solar cell materials, using the electronic bandgap as the target property. We propose a new approach of using multifidelity data through leveraging machine learning models of both low- and high-fidelity data, where using predicted low-fidelity data as an input feature in the high-fidelity model can improve the impact of a multifidelity data approach. We show how tradeoffs of low- versus high-fidelity measurement cost and acquisition can impact the materials discovery process. We find that the use of multifidelity data has maximal impact on the materials discovery campaign when approximately five low-fidelity measurements per high-fidelity measurement are performed, and when the cost of low-fidelity measurements is approximately 5% or less than that of high-fidelity measurements. This work provides practical guidance and useful qualitative measures for improving materials discovery campaigns that involve multifidelity data.https://doi.org/10.1088/2632-2153/ad1627machine learningmultifidelity dataactive learningmaterials discovery |
spellingShingle | Ryan Jacobs Philip E Goins Dane Morgan Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap Machine Learning: Science and Technology machine learning multifidelity data active learning materials discovery |
title | Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap |
title_full | Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap |
title_fullStr | Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap |
title_full_unstemmed | Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap |
title_short | Role of multifidelity data in sequential active learning materials discovery campaigns: case study of electronic bandgap |
title_sort | role of multifidelity data in sequential active learning materials discovery campaigns case study of electronic bandgap |
topic | machine learning multifidelity data active learning materials discovery |
url | https://doi.org/10.1088/2632-2153/ad1627 |
work_keys_str_mv | AT ryanjacobs roleofmultifidelitydatainsequentialactivelearningmaterialsdiscoverycampaignscasestudyofelectronicbandgap AT philipegoins roleofmultifidelitydatainsequentialactivelearningmaterialsdiscoverycampaignscasestudyofelectronicbandgap AT danemorgan roleofmultifidelitydatainsequentialactivelearningmaterialsdiscoverycampaignscasestudyofelectronicbandgap |