Machine learning and coresets for automated real-time data segmentation and summarization

Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology
Other Authors: Daniela Rus.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/107865
_version_ 1811075873331216384
author Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology
author2 Daniela Rus.
author_facet Daniela Rus.
Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology
author_sort Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T10:13:09Z
format Thesis
id mit-1721.1/107865
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T10:13:09Z
publishDate 2017
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1078652019-04-10T12:32:59Z Machine learning and coresets for automated real-time data segmentation and summarization Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology Daniela Rus. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 160-174). In this thesis, we develop a family of real-time data reduction algorithms for large data streams, by computing a compact and meaningful representation of the data called a coreset. This representation can then be used to enable efficient analysis such as segmentation, summarization, classification, and prediction. Our proposed algorithms support large streams and datasets that axe too large to store in memory, allow easy parallelization, and generalize to different data types and analyses. We discuss some of the challenges that arise when dealing with real Big Data systems. Such systems are designed to routinely process unseen, possibly unbounded, data streams; are expected to perform reliably, online, in real-time, in the presence of noise, and under many performance and bandwidth limitations; and are required to produce results that are provably close to optimal. We will motivate the need for new data reduction techniques, in the form of theoretical and practical open problems in computer science, robotics, and medicine, and show how coresets can help to overcome these challenges and enable us to build several practical systems that meet these specifications. We propose a theoretical framework for constructing several coreset algorithms that efficiently compress the data while preserving its semantic content. We provide an efficient construction of our algorithms and present several systems that are capable of handling unbounded, real-time data streams, and are easily scalable and parallelizable. Finally, we demonstrate the performance of our systems with numerous experimental results on a variety of data sources, from financial price data to laparoscopic surgery video. by Mikhail Volkov. Ph. D. in Computer Science and Engineering 2017-04-05T16:00:43Z 2017-04-05T16:00:43Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107865 976168223 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 174 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology
Machine learning and coresets for automated real-time data segmentation and summarization
title Machine learning and coresets for automated real-time data segmentation and summarization
title_full Machine learning and coresets for automated real-time data segmentation and summarization
title_fullStr Machine learning and coresets for automated real-time data segmentation and summarization
title_full_unstemmed Machine learning and coresets for automated real-time data segmentation and summarization
title_short Machine learning and coresets for automated real-time data segmentation and summarization
title_sort machine learning and coresets for automated real time data segmentation and summarization
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/107865
work_keys_str_mv AT volkovmikhailphdmassachusettsinstituteoftechnology machinelearningandcoresetsforautomatedrealtimedatasegmentationandsummarization