Machine learning and coresets for automated real-time data segmentation and summarization
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/107865 |
_version_ | 1811075873331216384 |
---|---|
author | Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology |
author2 | Daniela Rus. |
author_facet | Daniela Rus. Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology |
author_sort | Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology |
collection | MIT |
description | Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. |
first_indexed | 2024-09-23T10:13:09Z |
format | Thesis |
id | mit-1721.1/107865 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T10:13:09Z |
publishDate | 2017 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1078652019-04-10T12:32:59Z Machine learning and coresets for automated real-time data segmentation and summarization Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology Daniela Rus. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 160-174). In this thesis, we develop a family of real-time data reduction algorithms for large data streams, by computing a compact and meaningful representation of the data called a coreset. This representation can then be used to enable efficient analysis such as segmentation, summarization, classification, and prediction. Our proposed algorithms support large streams and datasets that axe too large to store in memory, allow easy parallelization, and generalize to different data types and analyses. We discuss some of the challenges that arise when dealing with real Big Data systems. Such systems are designed to routinely process unseen, possibly unbounded, data streams; are expected to perform reliably, online, in real-time, in the presence of noise, and under many performance and bandwidth limitations; and are required to produce results that are provably close to optimal. We will motivate the need for new data reduction techniques, in the form of theoretical and practical open problems in computer science, robotics, and medicine, and show how coresets can help to overcome these challenges and enable us to build several practical systems that meet these specifications. We propose a theoretical framework for constructing several coreset algorithms that efficiently compress the data while preserving its semantic content. We provide an efficient construction of our algorithms and present several systems that are capable of handling unbounded, real-time data streams, and are easily scalable and parallelizable. Finally, we demonstrate the performance of our systems with numerous experimental results on a variety of data sources, from financial price data to laparoscopic surgery video. by Mikhail Volkov. Ph. D. in Computer Science and Engineering 2017-04-05T16:00:43Z 2017-04-05T16:00:43Z 2016 2016 Thesis http://hdl.handle.net/1721.1/107865 976168223 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 174 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Volkov, Mikhail, Ph. D. Massachusetts Institute of Technology Machine learning and coresets for automated real-time data segmentation and summarization |
title | Machine learning and coresets for automated real-time data segmentation and summarization |
title_full | Machine learning and coresets for automated real-time data segmentation and summarization |
title_fullStr | Machine learning and coresets for automated real-time data segmentation and summarization |
title_full_unstemmed | Machine learning and coresets for automated real-time data segmentation and summarization |
title_short | Machine learning and coresets for automated real-time data segmentation and summarization |
title_sort | machine learning and coresets for automated real time data segmentation and summarization |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/107865 |
work_keys_str_mv | AT volkovmikhailphdmassachusettsinstituteoftechnology machinelearningandcoresetsforautomatedrealtimedatasegmentationandsummarization |