A data-driven method for automated data superposition with applications in soft matter science

The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a fe...

Full description

Bibliographic Details
Main Authors: Kyle R. Lennon, Gareth H. McKinley, James W. Swan
Format: Article
Language:English
Published: Cambridge University Press 2023-01-01
Series:Data-Centric Engineering
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2632673623000035/type/journal_article
_version_ 1797821043836977152
author Kyle R. Lennon
Gareth H. McKinley
James W. Swan
author_facet Kyle R. Lennon
Gareth H. McKinley
James W. Swan
author_sort Kyle R. Lennon
collection DOAJ
description The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
first_indexed 2024-03-13T09:46:57Z
format Article
id doaj.art-d6c14fb609044866a24089a864a232d4
institution Directory Open Access Journal
issn 2632-6736
language English
last_indexed 2024-03-13T09:46:57Z
publishDate 2023-01-01
publisher Cambridge University Press
record_format Article
series Data-Centric Engineering
spelling doaj.art-d6c14fb609044866a24089a864a232d42023-05-25T04:00:24ZengCambridge University PressData-Centric Engineering2632-67362023-01-01410.1017/dce.2023.3A data-driven method for automated data superposition with applications in soft matter scienceKyle R. Lennon0https://orcid.org/0000-0002-1251-5461Gareth H. McKinley1James W. Swan2Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USADepartment of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USADepartment of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USAThe superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.https://www.cambridge.org/core/product/identifier/S2632673623000035/type/journal_articleBayesian statisticsGaussian process regressionmethod of reduced variablesself-similarity
spellingShingle Kyle R. Lennon
Gareth H. McKinley
James W. Swan
A data-driven method for automated data superposition with applications in soft matter science
Data-Centric Engineering
Bayesian statistics
Gaussian process regression
method of reduced variables
self-similarity
title A data-driven method for automated data superposition with applications in soft matter science
title_full A data-driven method for automated data superposition with applications in soft matter science
title_fullStr A data-driven method for automated data superposition with applications in soft matter science
title_full_unstemmed A data-driven method for automated data superposition with applications in soft matter science
title_short A data-driven method for automated data superposition with applications in soft matter science
title_sort data driven method for automated data superposition with applications in soft matter science
topic Bayesian statistics
Gaussian process regression
method of reduced variables
self-similarity
url https://www.cambridge.org/core/product/identifier/S2632673623000035/type/journal_article
work_keys_str_mv AT kylerlennon adatadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience
AT garethhmckinley adatadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience
AT jameswswan adatadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience
AT kylerlennon datadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience
AT garethhmckinley datadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience
AT jameswswan datadrivenmethodforautomateddatasuperpositionwithapplicationsinsoftmatterscience