ProSecCo: progressive sequence mining with convergence guarantees

Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be u...

Full description

Bibliographic Details
Main Authors: Servan-Schreiber, Sacha, Riondato, Matteo, Zgraggen, Emanuel
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: Springer London 2021
Online Access:https://hdl.handle.net/1721.1/131797
_version_ 1826204196038246400
author Servan-Schreiber, Sacha
Riondato, Matteo
Zgraggen, Emanuel
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Servan-Schreiber, Sacha
Riondato, Matteo
Zgraggen, Emanuel
author_sort Servan-Schreiber, Sacha
collection MIT
description Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms.
first_indexed 2024-09-23T12:50:24Z
format Article
id mit-1721.1/131797
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T12:50:24Z
publishDate 2021
publisher Springer London
record_format dspace
spelling mit-1721.1/1317972023-09-07T21:13:52Z ProSecCo: progressive sequence mining with convergence guarantees Servan-Schreiber, Sacha Riondato, Matteo Zgraggen, Emanuel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms. 2021-09-20T17:30:18Z 2021-09-20T17:30:18Z 2019-08-20 2020-09-24T20:42:09Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/131797 en https://doi.org/10.1007/s10115-019-01393-8 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Springer-Verlag London Ltd., part of Springer Nature application/pdf Springer London Springer London
spellingShingle Servan-Schreiber, Sacha
Riondato, Matteo
Zgraggen, Emanuel
ProSecCo: progressive sequence mining with convergence guarantees
title ProSecCo: progressive sequence mining with convergence guarantees
title_full ProSecCo: progressive sequence mining with convergence guarantees
title_fullStr ProSecCo: progressive sequence mining with convergence guarantees
title_full_unstemmed ProSecCo: progressive sequence mining with convergence guarantees
title_short ProSecCo: progressive sequence mining with convergence guarantees
title_sort prosecco progressive sequence mining with convergence guarantees
url https://hdl.handle.net/1721.1/131797
work_keys_str_mv AT servanschreibersacha proseccoprogressivesequenceminingwithconvergenceguarantees
AT riondatomatteo proseccoprogressivesequenceminingwithconvergenceguarantees
AT zgraggenemanuel proseccoprogressivesequenceminingwithconvergenceguarantees