Inference and Robotic Path Planning over High Dimensional Categorical Observations

Advances in marine autonomy, deep-learning, and in-situ marine sensing technology have enabled oceanographers to collect vast amounts of spatiotemporally-distributed, sparse, high dimensional categorical data. Statistical models, particularly in streaming and computationally constrained settings, ha...

Full description

Bibliographic Details
Main Author: San Soucie, John Edward
Other Authors: Girdhar, Yogesh
Format: Thesis
Published: Massachusetts Institute of Technology 2024
Online Access:https://hdl.handle.net/1721.1/156616
_version_ 1811084181245001728
author San Soucie, John Edward
author2 Girdhar, Yogesh
author_facet Girdhar, Yogesh
San Soucie, John Edward
author_sort San Soucie, John Edward
collection MIT
description Advances in marine autonomy, deep-learning, and in-situ marine sensing technology have enabled oceanographers to collect vast amounts of spatiotemporally-distributed, sparse, high dimensional categorical data. Statistical models, particularly in streaming and computationally constrained settings, have lagged behind data collection. Recent developments in topic modeling for robotics have highlighted the potential to efficiently extract meaningful relationships from categorical data, and adjust robotic path-planning based on real-time inference. This dissertation seeks to fill the gap in streaming statistical models for sparse, high-dimensional categorical data, in the context of open-ocean phytoplankton community ecology. We begin by exploring the use of existing topic modeling approaches for plankton community characterization. Topic models are compared to standard ecological techniques for dimensionality reduction. The increased fidelity and expressiveness of the topic modeling approach allows for greater resolution of plankton co-occurrence relationships. By analyzing these relationships and ocean physics in and around a retentive eddy, the source of phytoplankton variability is traced to storm-driven advection on the ocean surface. We conclude that topic models offer unique insights into the causal mechanisms underlying plankton community variability. Next, we turn our focus to the development of a streaming belief model for categorical path planning. Such a model must be capable of predicting in regions without data, and it must be able to process streaming data in a computationally efficient manner. We introduce the Gaussian Dirichlet Random Field model, a novel topic model with spatially continuous latent log-probabilities. In addition to producing a more accurate model than the state-of-the-art in locations with data, the Gaussian Dirichlet Random Field model can interpolate and extrapolate. The model is initially presented with a batch hybrid Markov Chain-Monte Carlo inference procedure. We develop a streaming fully-variational inference approach for inference, called Streaming Gaussian Dirichlet Random Fields, which satisfies both the prediction and efficiency requirements for path planning belief models. In-silico experiments demonstrate the ability of this model to accurately map latent co-occurrence patterns. Comparisons to a standard Gaussian process on both path-planning tasks and observation mapping tasks show how the ability of Streaming Gaussian Dirichlet Random Fields to leverage additional categorical observations enables superior performance.
first_indexed 2024-09-23T12:46:39Z
format Thesis
id mit-1721.1/156616
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T12:46:39Z
publishDate 2024
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1566162024-09-04T03:31:43Z Inference and Robotic Path Planning over High Dimensional Categorical Observations San Soucie, John Edward Girdhar, Yogesh Sosik, Heidi M. Massachusetts Institute of Technology. Department of Mechanical Engineering Joint Program in Applied Ocean Science and Engineering Advances in marine autonomy, deep-learning, and in-situ marine sensing technology have enabled oceanographers to collect vast amounts of spatiotemporally-distributed, sparse, high dimensional categorical data. Statistical models, particularly in streaming and computationally constrained settings, have lagged behind data collection. Recent developments in topic modeling for robotics have highlighted the potential to efficiently extract meaningful relationships from categorical data, and adjust robotic path-planning based on real-time inference. This dissertation seeks to fill the gap in streaming statistical models for sparse, high-dimensional categorical data, in the context of open-ocean phytoplankton community ecology. We begin by exploring the use of existing topic modeling approaches for plankton community characterization. Topic models are compared to standard ecological techniques for dimensionality reduction. The increased fidelity and expressiveness of the topic modeling approach allows for greater resolution of plankton co-occurrence relationships. By analyzing these relationships and ocean physics in and around a retentive eddy, the source of phytoplankton variability is traced to storm-driven advection on the ocean surface. We conclude that topic models offer unique insights into the causal mechanisms underlying plankton community variability. Next, we turn our focus to the development of a streaming belief model for categorical path planning. Such a model must be capable of predicting in regions without data, and it must be able to process streaming data in a computationally efficient manner. We introduce the Gaussian Dirichlet Random Field model, a novel topic model with spatially continuous latent log-probabilities. In addition to producing a more accurate model than the state-of-the-art in locations with data, the Gaussian Dirichlet Random Field model can interpolate and extrapolate. The model is initially presented with a batch hybrid Markov Chain-Monte Carlo inference procedure. We develop a streaming fully-variational inference approach for inference, called Streaming Gaussian Dirichlet Random Fields, which satisfies both the prediction and efficiency requirements for path planning belief models. In-silico experiments demonstrate the ability of this model to accurately map latent co-occurrence patterns. Comparisons to a standard Gaussian process on both path-planning tasks and observation mapping tasks show how the ability of Streaming Gaussian Dirichlet Random Fields to leverage additional categorical observations enables superior performance. Ph.D. 2024-09-03T21:12:04Z 2024-09-03T21:12:04Z 2024-05 2024-08-29T14:24:25.071Z Thesis https://hdl.handle.net/1721.1/156616 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle San Soucie, John Edward
Inference and Robotic Path Planning over High Dimensional Categorical Observations
title Inference and Robotic Path Planning over High Dimensional Categorical Observations
title_full Inference and Robotic Path Planning over High Dimensional Categorical Observations
title_fullStr Inference and Robotic Path Planning over High Dimensional Categorical Observations
title_full_unstemmed Inference and Robotic Path Planning over High Dimensional Categorical Observations
title_short Inference and Robotic Path Planning over High Dimensional Categorical Observations
title_sort inference and robotic path planning over high dimensional categorical observations
url https://hdl.handle.net/1721.1/156616
work_keys_str_mv AT sansouciejohnedward inferenceandroboticpathplanningoverhighdimensionalcategoricalobservations