Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicabi...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-05-01
|
Series: | Frontiers in Astronomy and Space Sciences |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/full |
_version_ | 1797824974764900352 |
---|---|
author | Victoria Da Poian Victoria Da Poian Victoria Da Poian Bethany Theiling Lily Clough Brett McKinney Jonathan Major Jingyi Chen Sarah Hörst |
author_facet | Victoria Da Poian Victoria Da Poian Victoria Da Poian Bethany Theiling Lily Clough Brett McKinney Jonathan Major Jingyi Chen Sarah Hörst |
author_sort | Victoria Da Poian |
collection | DOAJ |
description | Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicability of data science and unsupervised machine learning (ML) techniques on isotope ratio mass spectrometry data (IRMS) from volatile laboratory analogs of Europa and Enceladus seawaters as a case study for development of new strategies for icy ocean world missions. Our driving science goal is to determine whether the mass spectra of volatile gases could contain information about the composition of the seawater and potential biosignatures. We implement data science and ML techniques to investigate what inherent information the spectra contain and determine whether a data science pipeline could be designed to quickly analyze data from future ocean worlds missions. In this study, we focus on the exploratory data analysis (EDA) step in the analytics pipeline. This is a crucial unsupervised learning step that allows us to understand the data in depth before subsequent steps such as predictive/supervised learning. EDA identifies and characterizes recurring patterns, significant correlation structure, and helps determine which variables are redundant and which contribute to significant variation in the lower dimensional space. In addition, EDA helps to identify irregularities such as outliers that might be due to poor data quality. We compared dimensionality reduction methods Uniform Manifold Approximation and Projection (UMAP) and Principal Component Analysis (PCA) for transforming our data from a high-dimensional space to a lower dimension, and we compared clustering algorithms for identifying data-driven groups (“clusters”) in the ocean worlds analog IRMS data and mapping these clusters to experimental conditions such as seawater composition and CO2 concentration. Such data analysis and characterization efforts are the first steps toward the longer-term science autonomy goal where similar automated ML tools could be used onboard a spacecraft to prioritize data transmissions for bandwidth-limited outer Solar System missions. |
first_indexed | 2024-03-13T10:46:15Z |
format | Article |
id | doaj.art-5e149d31267a4520be12da7707b942bd |
institution | Directory Open Access Journal |
issn | 2296-987X |
language | English |
last_indexed | 2024-03-13T10:46:15Z |
publishDate | 2023-05-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Astronomy and Space Sciences |
spelling | doaj.art-5e149d31267a4520be12da7707b942bd2023-05-17T17:01:50ZengFrontiers Media S.A.Frontiers in Astronomy and Space Sciences2296-987X2023-05-011010.3389/fspas.2023.11341411134141Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometryVictoria Da Poian0Victoria Da Poian1Victoria Da Poian2Bethany Theiling3Lily Clough4Brett McKinney5Jonathan Major6Jingyi Chen7Sarah Hörst8NASA Goddard Space Flight Center, Greenbelt, MD, United StatesMicrotel LLC, Greenbelt, MD, United StatesJohns Hopkins University, Baltimore, MD, United StatesNASA Goddard Space Flight Center, Greenbelt, MD, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesUniversity of South Florida, Tampa, FL, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesJohns Hopkins University, Baltimore, MD, United StatesMany upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicability of data science and unsupervised machine learning (ML) techniques on isotope ratio mass spectrometry data (IRMS) from volatile laboratory analogs of Europa and Enceladus seawaters as a case study for development of new strategies for icy ocean world missions. Our driving science goal is to determine whether the mass spectra of volatile gases could contain information about the composition of the seawater and potential biosignatures. We implement data science and ML techniques to investigate what inherent information the spectra contain and determine whether a data science pipeline could be designed to quickly analyze data from future ocean worlds missions. In this study, we focus on the exploratory data analysis (EDA) step in the analytics pipeline. This is a crucial unsupervised learning step that allows us to understand the data in depth before subsequent steps such as predictive/supervised learning. EDA identifies and characterizes recurring patterns, significant correlation structure, and helps determine which variables are redundant and which contribute to significant variation in the lower dimensional space. In addition, EDA helps to identify irregularities such as outliers that might be due to poor data quality. We compared dimensionality reduction methods Uniform Manifold Approximation and Projection (UMAP) and Principal Component Analysis (PCA) for transforming our data from a high-dimensional space to a lower dimension, and we compared clustering algorithms for identifying data-driven groups (“clusters”) in the ocean worlds analog IRMS data and mapping these clusters to experimental conditions such as seawater composition and CO2 concentration. Such data analysis and characterization efforts are the first steps toward the longer-term science autonomy goal where similar automated ML tools could be used onboard a spacecraft to prioritize data transmissions for bandwidth-limited outer Solar System missions.https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/fullmachine learningexploratory data analysismass spectrometryocean worlds analog dataunsupervised learningscience autonomy |
spellingShingle | Victoria Da Poian Victoria Da Poian Victoria Da Poian Bethany Theiling Lily Clough Brett McKinney Jonathan Major Jingyi Chen Sarah Hörst Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry Frontiers in Astronomy and Space Sciences machine learning exploratory data analysis mass spectrometry ocean worlds analog data unsupervised learning science autonomy |
title | Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry |
title_full | Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry |
title_fullStr | Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry |
title_full_unstemmed | Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry |
title_short | Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry |
title_sort | exploratory data analysis eda machine learning approaches for ocean world analog mass spectrometry |
topic | machine learning exploratory data analysis mass spectrometry ocean worlds analog data unsupervised learning science autonomy |
url | https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/full |
work_keys_str_mv | AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT bethanytheiling exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT lilyclough exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT brettmckinney exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT jonathanmajor exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT jingyichen exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry AT sarahhorst exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry |