ProSecCo: progressive sequence mining with convergence guarantees
Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be u...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Springer London
2021
|
Online Access: | https://hdl.handle.net/1721.1/131797 |
_version_ | 1826204196038246400 |
---|---|
author | Servan-Schreiber, Sacha Riondato, Matteo Zgraggen, Emanuel |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Servan-Schreiber, Sacha Riondato, Matteo Zgraggen, Emanuel |
author_sort | Servan-Schreiber, Sacha |
collection | MIT |
description | Abstract
We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms. |
first_indexed | 2024-09-23T12:50:24Z |
format | Article |
id | mit-1721.1/131797 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T12:50:24Z |
publishDate | 2021 |
publisher | Springer London |
record_format | dspace |
spelling | mit-1721.1/1317972023-09-07T21:13:52Z ProSecCo: progressive sequence mining with convergence guarantees Servan-Schreiber, Sacha Riondato, Matteo Zgraggen, Emanuel Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms. 2021-09-20T17:30:18Z 2021-09-20T17:30:18Z 2019-08-20 2020-09-24T20:42:09Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/131797 en https://doi.org/10.1007/s10115-019-01393-8 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Springer-Verlag London Ltd., part of Springer Nature application/pdf Springer London Springer London |
spellingShingle | Servan-Schreiber, Sacha Riondato, Matteo Zgraggen, Emanuel ProSecCo: progressive sequence mining with convergence guarantees |
title | ProSecCo: progressive sequence mining with convergence guarantees |
title_full | ProSecCo: progressive sequence mining with convergence guarantees |
title_fullStr | ProSecCo: progressive sequence mining with convergence guarantees |
title_full_unstemmed | ProSecCo: progressive sequence mining with convergence guarantees |
title_short | ProSecCo: progressive sequence mining with convergence guarantees |
title_sort | prosecco progressive sequence mining with convergence guarantees |
url | https://hdl.handle.net/1721.1/131797 |
work_keys_str_mv | AT servanschreibersacha proseccoprogressivesequenceminingwithconvergenceguarantees AT riondatomatteo proseccoprogressivesequenceminingwithconvergenceguarantees AT zgraggenemanuel proseccoprogressivesequenceminingwithconvergenceguarantees |