The single pixel GPS: learning big data signals from tiny coresets

We present algorithms for simplifying and clustering patterns from sensors such as GPS, LiDAR, and other devices that can produce high-dimensional signals. The algorithms are suitable for handling very large (e.g. terabytes) streaming data and can be run in parallel on networks or clouds. Applicatio...

Full description

Bibliographic Details
Main Authors: Feldman, Dan, Sung, Cynthia Rueyi, Rus, Daniela L.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Association for Computing Machinery (ACM) 2014
Online Access:http://hdl.handle.net/1721.1/90590
https://orcid.org/0000-0001-5473-3566
https://orcid.org/0000-0002-8967-1841
_version_ 1826202042476003328
author Feldman, Dan
Sung, Cynthia Rueyi
Rus, Daniela L.
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Feldman, Dan
Sung, Cynthia Rueyi
Rus, Daniela L.
author_sort Feldman, Dan
collection MIT
description We present algorithms for simplifying and clustering patterns from sensors such as GPS, LiDAR, and other devices that can produce high-dimensional signals. The algorithms are suitable for handling very large (e.g. terabytes) streaming data and can be run in parallel on networks or clouds. Applications include compression, denoising, activity recognition, road matching, and map generation. We encode these problems as (k, m)-segment mean problems. Formally, we provide (1 + ε)-approximations to the k-segment and (k, m)-segment mean of a d-dimensional discrete-time signal. The k-segment mean is a k-piecewise linear function that minimizes the regression distance to the signal. The (k,m)-segment mean has an additional constraint that the projection of the k segments on R[superscript d] consists of only m ≤ k segments. Existing algorithms for these problems take O(kn[superscript 2]) and n[superscript O(mk)] time respectively and O(kn[superscript 2]) space, where n is the length of the signal. Our main tool is a new coreset for discrete-time signals. The coreset is a smart compression of the input signal that allows computation of a (1 + ε)-approximation to the k-segment or (k,m)-segment mean in O(n log n) time for arbitrary constants ε,k, and m. We use coresets to obtain a parallel algorithm that scans the signal in one pass, using space and update time per point that is polynomial in log n. We provide empirical evaluations of the quality of our coreset and experimental results that show how our coreset boosts both inefficient optimal algorithms and existing heuristics. We demonstrate our results for extracting signals from GPS traces. However, the results are more general and applicable to other types of sensors.
first_indexed 2024-09-23T12:00:58Z
format Article
id mit-1721.1/90590
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T12:00:58Z
publishDate 2014
publisher Association for Computing Machinery (ACM)
record_format dspace
spelling mit-1721.1/905902022-09-27T23:30:53Z The single pixel GPS: learning big data signals from tiny coresets Feldman, Dan Sung, Cynthia Rueyi Rus, Daniela L. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. School of Engineering Feldman, Dan Sung, Cynthia Rueyi Rus, Daniela L. We present algorithms for simplifying and clustering patterns from sensors such as GPS, LiDAR, and other devices that can produce high-dimensional signals. The algorithms are suitable for handling very large (e.g. terabytes) streaming data and can be run in parallel on networks or clouds. Applications include compression, denoising, activity recognition, road matching, and map generation. We encode these problems as (k, m)-segment mean problems. Formally, we provide (1 + ε)-approximations to the k-segment and (k, m)-segment mean of a d-dimensional discrete-time signal. The k-segment mean is a k-piecewise linear function that minimizes the regression distance to the signal. The (k,m)-segment mean has an additional constraint that the projection of the k segments on R[superscript d] consists of only m ≤ k segments. Existing algorithms for these problems take O(kn[superscript 2]) and n[superscript O(mk)] time respectively and O(kn[superscript 2]) space, where n is the length of the signal. Our main tool is a new coreset for discrete-time signals. The coreset is a smart compression of the input signal that allows computation of a (1 + ε)-approximation to the k-segment or (k,m)-segment mean in O(n log n) time for arbitrary constants ε,k, and m. We use coresets to obtain a parallel algorithm that scans the signal in one pass, using space and update time per point that is polynomial in log n. We provide empirical evaluations of the quality of our coreset and experimental results that show how our coreset boosts both inefficient optimal algorithms and existing heuristics. We demonstrate our results for extracting signals from GPS traces. However, the results are more general and applicable to other types of sensors. United States. Office of Naval Research (Grant ONR-MURI Award N00014-09-1-1051) United States. Office of Naval Research (Grant ONR-MURI Award N00014-09-1-1031) Singapore-MIT Alliance for Research and Technology Google (Firm) 2014-10-07T17:51:17Z 2014-10-07T17:51:17Z 2012-11 Article http://purl.org/eprint/type/ConferencePaper 9781450316910 http://hdl.handle.net/1721.1/90590 Dan Feldman, Cynthia Sung, and Daniela Rus. 2012. The single pixel GPS: learning big data signals from tiny coresets. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '12). ACM, New York, NY, USA, 23-32. https://orcid.org/0000-0001-5473-3566 https://orcid.org/0000-0002-8967-1841 en_US http://dx.doi.org/10.1145/2424321.2424325 Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '12) Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) MIT web domain
spellingShingle Feldman, Dan
Sung, Cynthia Rueyi
Rus, Daniela L.
The single pixel GPS: learning big data signals from tiny coresets
title The single pixel GPS: learning big data signals from tiny coresets
title_full The single pixel GPS: learning big data signals from tiny coresets
title_fullStr The single pixel GPS: learning big data signals from tiny coresets
title_full_unstemmed The single pixel GPS: learning big data signals from tiny coresets
title_short The single pixel GPS: learning big data signals from tiny coresets
title_sort single pixel gps learning big data signals from tiny coresets
url http://hdl.handle.net/1721.1/90590
https://orcid.org/0000-0001-5473-3566
https://orcid.org/0000-0002-8967-1841
work_keys_str_mv AT feldmandan thesinglepixelgpslearningbigdatasignalsfromtinycoresets
AT sungcynthiarueyi thesinglepixelgpslearningbigdatasignalsfromtinycoresets
AT rusdanielal thesinglepixelgpslearningbigdatasignalsfromtinycoresets
AT feldmandan singlepixelgpslearningbigdatasignalsfromtinycoresets
AT sungcynthiarueyi singlepixelgpslearningbigdatasignalsfromtinycoresets
AT rusdanielal singlepixelgpslearningbigdatasignalsfromtinycoresets