Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry

Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicabi...

Full description

Bibliographic Details
Main Authors: Victoria Da Poian, Bethany Theiling, Lily Clough, Brett McKinney, Jonathan Major, Jingyi Chen, Sarah Hörst
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Astronomy and Space Sciences
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/full
_version_ 1797824974764900352
author Victoria Da Poian
Victoria Da Poian
Victoria Da Poian
Bethany Theiling
Lily Clough
Brett McKinney
Jonathan Major
Jingyi Chen
Sarah Hörst
author_facet Victoria Da Poian
Victoria Da Poian
Victoria Da Poian
Bethany Theiling
Lily Clough
Brett McKinney
Jonathan Major
Jingyi Chen
Sarah Hörst
author_sort Victoria Da Poian
collection DOAJ
description Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicability of data science and unsupervised machine learning (ML) techniques on isotope ratio mass spectrometry data (IRMS) from volatile laboratory analogs of Europa and Enceladus seawaters as a case study for development of new strategies for icy ocean world missions. Our driving science goal is to determine whether the mass spectra of volatile gases could contain information about the composition of the seawater and potential biosignatures. We implement data science and ML techniques to investigate what inherent information the spectra contain and determine whether a data science pipeline could be designed to quickly analyze data from future ocean worlds missions. In this study, we focus on the exploratory data analysis (EDA) step in the analytics pipeline. This is a crucial unsupervised learning step that allows us to understand the data in depth before subsequent steps such as predictive/supervised learning. EDA identifies and characterizes recurring patterns, significant correlation structure, and helps determine which variables are redundant and which contribute to significant variation in the lower dimensional space. In addition, EDA helps to identify irregularities such as outliers that might be due to poor data quality. We compared dimensionality reduction methods Uniform Manifold Approximation and Projection (UMAP) and Principal Component Analysis (PCA) for transforming our data from a high-dimensional space to a lower dimension, and we compared clustering algorithms for identifying data-driven groups (“clusters”) in the ocean worlds analog IRMS data and mapping these clusters to experimental conditions such as seawater composition and CO2 concentration. Such data analysis and characterization efforts are the first steps toward the longer-term science autonomy goal where similar automated ML tools could be used onboard a spacecraft to prioritize data transmissions for bandwidth-limited outer Solar System missions.
first_indexed 2024-03-13T10:46:15Z
format Article
id doaj.art-5e149d31267a4520be12da7707b942bd
institution Directory Open Access Journal
issn 2296-987X
language English
last_indexed 2024-03-13T10:46:15Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Astronomy and Space Sciences
spelling doaj.art-5e149d31267a4520be12da7707b942bd2023-05-17T17:01:50ZengFrontiers Media S.A.Frontiers in Astronomy and Space Sciences2296-987X2023-05-011010.3389/fspas.2023.11341411134141Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometryVictoria Da Poian0Victoria Da Poian1Victoria Da Poian2Bethany Theiling3Lily Clough4Brett McKinney5Jonathan Major6Jingyi Chen7Sarah Hörst8NASA Goddard Space Flight Center, Greenbelt, MD, United StatesMicrotel LLC, Greenbelt, MD, United StatesJohns Hopkins University, Baltimore, MD, United StatesNASA Goddard Space Flight Center, Greenbelt, MD, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesUniversity of South Florida, Tampa, FL, United StatesTandy School of Computer Science, The University of Tulsa, Tulsa, OK, United StatesJohns Hopkins University, Baltimore, MD, United StatesMany upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicability of data science and unsupervised machine learning (ML) techniques on isotope ratio mass spectrometry data (IRMS) from volatile laboratory analogs of Europa and Enceladus seawaters as a case study for development of new strategies for icy ocean world missions. Our driving science goal is to determine whether the mass spectra of volatile gases could contain information about the composition of the seawater and potential biosignatures. We implement data science and ML techniques to investigate what inherent information the spectra contain and determine whether a data science pipeline could be designed to quickly analyze data from future ocean worlds missions. In this study, we focus on the exploratory data analysis (EDA) step in the analytics pipeline. This is a crucial unsupervised learning step that allows us to understand the data in depth before subsequent steps such as predictive/supervised learning. EDA identifies and characterizes recurring patterns, significant correlation structure, and helps determine which variables are redundant and which contribute to significant variation in the lower dimensional space. In addition, EDA helps to identify irregularities such as outliers that might be due to poor data quality. We compared dimensionality reduction methods Uniform Manifold Approximation and Projection (UMAP) and Principal Component Analysis (PCA) for transforming our data from a high-dimensional space to a lower dimension, and we compared clustering algorithms for identifying data-driven groups (“clusters”) in the ocean worlds analog IRMS data and mapping these clusters to experimental conditions such as seawater composition and CO2 concentration. Such data analysis and characterization efforts are the first steps toward the longer-term science autonomy goal where similar automated ML tools could be used onboard a spacecraft to prioritize data transmissions for bandwidth-limited outer Solar System missions.https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/fullmachine learningexploratory data analysismass spectrometryocean worlds analog dataunsupervised learningscience autonomy
spellingShingle Victoria Da Poian
Victoria Da Poian
Victoria Da Poian
Bethany Theiling
Lily Clough
Brett McKinney
Jonathan Major
Jingyi Chen
Sarah Hörst
Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
Frontiers in Astronomy and Space Sciences
machine learning
exploratory data analysis
mass spectrometry
ocean worlds analog data
unsupervised learning
science autonomy
title Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
title_full Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
title_fullStr Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
title_full_unstemmed Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
title_short Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry
title_sort exploratory data analysis eda machine learning approaches for ocean world analog mass spectrometry
topic machine learning
exploratory data analysis
mass spectrometry
ocean worlds analog data
unsupervised learning
science autonomy
url https://www.frontiersin.org/articles/10.3389/fspas.2023.1134141/full
work_keys_str_mv AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT victoriadapoian exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT bethanytheiling exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT lilyclough exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT brettmckinney exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT jonathanmajor exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT jingyichen exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry
AT sarahhorst exploratorydataanalysisedamachinelearningapproachesforoceanworldanalogmassspectrometry