Efficient discovery of sequence outlier patterns

© 2018, VLDB Endowment. Modern Internet of Things (IoT) applications generate massive amounts of time-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental li...

Full description

Bibliographic Details
Main Authors: Cao, Lei, Yan, Yizhou, Madden, Samuel, Rundensteiner, Elke A, Gopalsamy, Mathan
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: VLDB Endowment 2021
Online Access:https://hdl.handle.net/1721.1/136517
_version_ 1811070281797599232
author Cao, Lei
Yan, Yizhou
Madden, Samuel
Rundensteiner, Elke A
Gopalsamy, Mathan
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Cao, Lei
Yan, Yizhou
Madden, Samuel
Rundensteiner, Elke A
Gopalsamy, Mathan
author_sort Cao, Lei
collection MIT
description © 2018, VLDB Endowment. Modern Internet of Things (IoT) applications generate massive amounts of time-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets.
first_indexed 2024-09-23T08:34:00Z
format Article
id mit-1721.1/136517
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T08:34:00Z
publishDate 2021
publisher VLDB Endowment
record_format dspace
spelling mit-1721.1/1365172023-09-28T20:17:12Z Efficient discovery of sequence outlier patterns Cao, Lei Yan, Yizhou Madden, Samuel Rundensteiner, Elke A Gopalsamy, Mathan Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science © 2018, VLDB Endowment. Modern Internet of Things (IoT) applications generate massive amounts of time-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets. 2021-10-27T20:35:45Z 2021-10-27T20:35:45Z 2019 2021-01-29T18:24:39Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/136517 en 10.14778/3324301.3324308 Proceedings of the VLDB Endowment Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf VLDB Endowment VLDB Endowment
spellingShingle Cao, Lei
Yan, Yizhou
Madden, Samuel
Rundensteiner, Elke A
Gopalsamy, Mathan
Efficient discovery of sequence outlier patterns
title Efficient discovery of sequence outlier patterns
title_full Efficient discovery of sequence outlier patterns
title_fullStr Efficient discovery of sequence outlier patterns
title_full_unstemmed Efficient discovery of sequence outlier patterns
title_short Efficient discovery of sequence outlier patterns
title_sort efficient discovery of sequence outlier patterns
url https://hdl.handle.net/1721.1/136517
work_keys_str_mv AT caolei efficientdiscoveryofsequenceoutlierpatterns
AT yanyizhou efficientdiscoveryofsequenceoutlierpatterns
AT maddensamuel efficientdiscoveryofsequenceoutlierpatterns
AT rundensteinerelkea efficientdiscoveryofsequenceoutlierpatterns
AT gopalsamymathan efficientdiscoveryofsequenceoutlierpatterns