Discovery of effective infrequent sequences based on maximum probability path

Process discovery usually analyses frequent behaviour in event logs to gain an intuitive understanding of processes. However, there are some effective infrequent behaviours that help to improve business processes in real life. Most existing studies either ignore them or treat them as harmful behavio...

Full description

Bibliographic Details
Main Authors: Ke Lu, Xianwen Fang, Na Fang, Esther Asare
Format: Article
Language:English
Published: Taylor & Francis Group 2022-12-01
Series:Connection Science
Subjects:
Online Access:http://dx.doi.org/10.1080/09540091.2021.1951667
_version_ 1797684069525356544
author Ke Lu
Xianwen Fang
Na Fang
Esther Asare
author_facet Ke Lu
Xianwen Fang
Na Fang
Esther Asare
author_sort Ke Lu
collection DOAJ
description Process discovery usually analyses frequent behaviour in event logs to gain an intuitive understanding of processes. However, there are some effective infrequent behaviours that help to improve business processes in real life. Most existing studies either ignore them or treat them as harmful behaviours. To distinguish effective infrequent sequences from noisy activities, this paper proposes an algorithm to analyse the distribution states of activities and the strong transfer relationships between behaviours based on maximum probability paths. The algorithm divides episodic traces into two categories: harmful and useful episodes, namely noisy activities and effective sequences. First, using conditional probability entropy, the infrequent logs are pre-processed to remove individual noisy activities that are extremely irregularly distributed in the traces. Effective sequences are then extracted from the logs based on the state transfer information of the activities. The algorithm is based on a PM4Py implementation and is validated using synthetic and real logs. From the results, the algorithm not only preserves the key structure of the model and reduces noise activity, but also improves the quality of the model.
first_indexed 2024-03-12T00:24:02Z
format Article
id doaj.art-5cc9c0d9c1e642c3b7bc902d925ca5c8
institution Directory Open Access Journal
issn 0954-0091
1360-0494
language English
last_indexed 2024-03-12T00:24:02Z
publishDate 2022-12-01
publisher Taylor & Francis Group
record_format Article
series Connection Science
spelling doaj.art-5cc9c0d9c1e642c3b7bc902d925ca5c82023-09-15T10:47:59ZengTaylor & Francis GroupConnection Science0954-00911360-04942022-12-01341638210.1080/09540091.2021.19516671951667Discovery of effective infrequent sequences based on maximum probability pathKe Lu0Xianwen Fang1Na Fang2Esther Asare3Anhui University of Science and TechnologyAnhui University of Science and TechnologyAnhui University of Science and TechnologyAnhui University of Science and TechnologyProcess discovery usually analyses frequent behaviour in event logs to gain an intuitive understanding of processes. However, there are some effective infrequent behaviours that help to improve business processes in real life. Most existing studies either ignore them or treat them as harmful behaviours. To distinguish effective infrequent sequences from noisy activities, this paper proposes an algorithm to analyse the distribution states of activities and the strong transfer relationships between behaviours based on maximum probability paths. The algorithm divides episodic traces into two categories: harmful and useful episodes, namely noisy activities and effective sequences. First, using conditional probability entropy, the infrequent logs are pre-processed to remove individual noisy activities that are extremely irregularly distributed in the traces. Effective sequences are then extracted from the logs based on the state transfer information of the activities. The algorithm is based on a PM4Py implementation and is validated using synthetic and real logs. From the results, the algorithm not only preserves the key structure of the model and reduces noise activity, but also improves the quality of the model.http://dx.doi.org/10.1080/09540091.2021.1951667effective infrequent sequencesnoise activitymaximum probability pathconditional probability entropystate transition matrixprocess discovery
spellingShingle Ke Lu
Xianwen Fang
Na Fang
Esther Asare
Discovery of effective infrequent sequences based on maximum probability path
Connection Science
effective infrequent sequences
noise activity
maximum probability path
conditional probability entropy
state transition matrix
process discovery
title Discovery of effective infrequent sequences based on maximum probability path
title_full Discovery of effective infrequent sequences based on maximum probability path
title_fullStr Discovery of effective infrequent sequences based on maximum probability path
title_full_unstemmed Discovery of effective infrequent sequences based on maximum probability path
title_short Discovery of effective infrequent sequences based on maximum probability path
title_sort discovery of effective infrequent sequences based on maximum probability path
topic effective infrequent sequences
noise activity
maximum probability path
conditional probability entropy
state transition matrix
process discovery
url http://dx.doi.org/10.1080/09540091.2021.1951667
work_keys_str_mv AT kelu discoveryofeffectiveinfrequentsequencesbasedonmaximumprobabilitypath
AT xianwenfang discoveryofeffectiveinfrequentsequencesbasedonmaximumprobabilitypath
AT nafang discoveryofeffectiveinfrequentsequencesbasedonmaximumprobabilitypath
AT estherasare discoveryofeffectiveinfrequentsequencesbasedonmaximumprobabilitypath