A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining

Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text...

Full description

Bibliographic Details
Main Authors: Qifan Chen, Yang Lu, Charmaine S. Tam, Simon K. Poon
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/14/6/181
_version_ 1797487331000713216
author Qifan Chen
Yang Lu
Charmaine S. Tam
Simon K. Poon
author_facet Qifan Chen
Yang Lu
Charmaine S. Tam
Simon K. Poon
author_sort Qifan Chen
collection DOAJ
description Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multi-view approach to automatically detect redundant activity labels by using not only context-aware features such as control–flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs.
first_indexed 2024-03-09T23:46:06Z
format Article
id doaj.art-c023f377e5564b72aa9daaea6f23e2d8
institution Directory Open Access Journal
issn 1999-5903
language English
last_indexed 2024-03-09T23:46:06Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj.art-c023f377e5564b72aa9daaea6f23e2d82023-11-23T16:43:39ZengMDPI AGFuture Internet1999-59032022-06-0114618110.3390/fi14060181A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process MiningQifan Chen0Yang Lu1Charmaine S. Tam2Simon K. Poon3School of Computer Science, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Computer Science, The University of Sydney, Sydney, NSW 2006, AustraliaCentre for Translational Data Science and Northern Clinical School, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Computer Science, The University of Sydney, Sydney, NSW 2006, AustraliaProcess mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multi-view approach to automatically detect redundant activity labels by using not only context-aware features such as control–flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs.https://www.mdpi.com/1999-5903/14/6/181process miningactivity labelprocess event logdata quality
spellingShingle Qifan Chen
Yang Lu
Charmaine S. Tam
Simon K. Poon
A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
Future Internet
process mining
activity label
process event log
data quality
title A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
title_full A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
title_fullStr A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
title_full_unstemmed A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
title_short A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
title_sort multi view framework to detect redundant activity labels for more representative event logs in process mining
topic process mining
activity label
process event log
data quality
url https://www.mdpi.com/1999-5903/14/6/181
work_keys_str_mv AT qifanchen amultiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT yanglu amultiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT charmainestam amultiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT simonkpoon amultiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT qifanchen multiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT yanglu multiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT charmainestam multiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining
AT simonkpoon multiviewframeworktodetectredundantactivitylabelsformorerepresentativeeventlogsinprocessmining