A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining

被引:4
作者
Chen, Qifan [1 ]
Lu, Yang [1 ]
Tam, Charmaine S. [2 ,3 ]
Poon, Simon K. [1 ]
机构
[1] Univ Sydney, Sch Comp Sci, Sydney, NSW 2006, Australia
[2] Univ Sydney, Ctr Translat Data Sci, Sydney, NSW 2006, Australia
[3] Univ Sydney, Northern Clin Sch, Sydney, NSW 2006, Australia
关键词
process mining; activity label; process event log; data quality; MINER AUTOMATED DISCOVERY; PROCESS MODELS; CHOICE;
D O I
10.3390/fi14060181
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multiview approach to automatically detect redundant activity labels by using not only context-aware features such as control-flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs.
引用
收藏
页数:23
相关论文
共 50 条
[1]   Improving Pattern Detection in Healthcare Process Mining Using an Interval-Based Event Selection Method [J].
Alharbi, Amirah ;
Bulpitt, Andy ;
Johnson, Owen .
BUSINESS PROCESS MANAGEMENT FORUM, 2017, 297 :88-105
[2]  
Assent I., 2006, ICDE Conference, P11
[3]   Split miner: automated discovery of accurate and simple business process models from event logs [J].
Augusto, Adriano ;
Conforti, Raffaele ;
Dumas, Marlon ;
La Rosa, Marcello ;
Polyvyanyy, Artem .
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 59 (02) :251-284
[4]  
Berti A, 2019, arXiv
[5]  
Bose RPJC, 2013, 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), P127, DOI 10.1109/CIDM.2013.6597227
[6]   Time-aware Concept Drift Detection Using the Earth Mover's Distance [J].
Brockhoff, Tobias ;
Uysal, Merih Seran ;
van der Aalst, Wil M. P. .
2020 2ND INTERNATIONAL CONFERENCE ON PROCESS MINING (ICPM 2020), 2020, :33-40
[7]  
Buijs J.C.A.M., 2012, P C MOVE MEANINGFUL, P305, DOI DOI 10.1007/978-3-642-33606-5_19
[8]  
Chen Q, 2021, P AUSTRALASIAN C INF, P30
[9]   National Heart Foundation of Australia and Cardiac Society of Australia and New Zealand: Australian clinical guidelines for the management of acute coronary syndromes 2016 [J].
Chew, Derek P. ;
Scott, Ian A. ;
Cullen, Louise ;
French, John K. ;
Briffa, Tom G. ;
Tideman, Philip A. ;
Woodruffe, Stephen ;
Kerr, Alistair ;
Branagan, Maree ;
Aylward, Philip E. G. .
MEDICAL JOURNAL OF AUSTRALIA, 2016, 205 (03) :128-133
[10]   Automatic Repair of Same-Timestamp Errors in Business Process Event Logs [J].
Conforti, Raffaele ;
La Rosa, Marcello ;
Ter Hofstede, Arthur H. M. ;
Augusto, Adriano .
BUSINESS PROCESS MANAGEMENT (BPM 2020), 2020, 12168 :327-345