Continuous representations and models from random walk diffusion limits

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Hashimoto, Tatsunori B. (Tatsunori Benjamin)
Other Authors: Tommi S. Jaakkola and David K. Gifford.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/105670
_version_ 1826194731717099520
author Hashimoto, Tatsunori B. (Tatsunori Benjamin)
author2 Tommi S. Jaakkola and David K. Gifford.
author_facet Tommi S. Jaakkola and David K. Gifford.
Hashimoto, Tatsunori B. (Tatsunori Benjamin)
author_sort Hashimoto, Tatsunori B. (Tatsunori Benjamin)
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T10:01:16Z
format Thesis
id mit-1721.1/105670
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T10:01:16Z
publishDate 2016
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1056702019-04-11T01:56:59Z Continuous representations and models from random walk diffusion limits Hashimoto, Tatsunori B. (Tatsunori Benjamin) Tommi S. Jaakkola and David K. Gifford. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 193-202). Structured data such as sequences and networks pose substantial difficulty for traditional statistical theory which has focused on data drawn independently from a vector space. A popular and empirically effective technique for dealing with such data is to map elements of the data to a vector space and to operate over the embedding as a summary statistic. Such a vector representation of discrete objects is known as a 'continuous representation'. Continuous space models of words, objects, and signals have become ubiquitous tools for learning rich representations of data, from natural language processing to computer vision. Even in cases that the embedding is not explicit, many algorithms operate over similarity measures which implicitly embed the original dataset. In this thesis, we attempt to understand the intuition behind continuous representations. Can we construct a general theory of continuous representations? Are there general principles for semantically meaninguful representations? In order to answer these questions, we develop a framework for analyzing continuous representations through diffusion limits of random walks. We show that measureable quantities of discrete random walks with a latent metric structure have closed form diffusion limits. These diffusion limits allow us to approximate attributes of the discrete random walk such as the stationary distribution, hitting time, or co-occurrence using closed-form expressions from diffusions. We establish limits which guarantee asymptotic consistency of such estimators, and show they work well in practice. Using this new approach, we solve three classes of problems: first, we derive principled network algorithms which connect statistical estimation tasks such as density estimation to network algorithms such as PageRank. Next, we demonstrate that continuous representations of words are a type of random walk metric estimator with close connections to manifold learning. Finally, we apply our theory to single-cell RNA seq data, and derive a way to learn time-series models without trajectories by using stochastic recurrent neural networks. by Tatsunori B. Hashimoto. Ph. D. 2016-12-05T19:57:20Z 2016-12-05T19:57:20Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105670 964448601 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 202 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Hashimoto, Tatsunori B. (Tatsunori Benjamin)
Continuous representations and models from random walk diffusion limits
title Continuous representations and models from random walk diffusion limits
title_full Continuous representations and models from random walk diffusion limits
title_fullStr Continuous representations and models from random walk diffusion limits
title_full_unstemmed Continuous representations and models from random walk diffusion limits
title_short Continuous representations and models from random walk diffusion limits
title_sort continuous representations and models from random walk diffusion limits
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/105670
work_keys_str_mv AT hashimototatsunoribtatsunoribenjamin continuousrepresentationsandmodelsfromrandomwalkdiffusionlimits