Data is Moody: Discovering Data Modification Rules from Process Event Logs

被引:0
作者
Schuster, Marco Bjarne [1 ]
Wiegand, Boris [2 ]
Vreeken, Jilles [3 ]
机构
[1] Airbus Operat GmbH, Bremen, Germany
[2] Stahl Holding Saar, Dillingen, Germany
[3] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT II, ECML PKDD 2024 | 2024年 / 14942卷
关键词
Process mining; Rule mining; MDL;
D O I
10.1007/978-3-031-70344-7_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although event logs are a powerful source to gain insight into the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules describing how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior. Given an event log, we aim to find accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.
引用
收藏
页码:285 / 302
页数:18
相关论文
共 42 条
[21]   BINet: Multivariate Business Process Anomaly Detection Using Deep Learning [J].
Nolle, Timo ;
Seeliger, Alexander ;
Muhlhauser, Max .
BUSINESS PROCESS MANAGEMENT (BPM 2018), 2018, 11080 :271-287
[22]   Mining sequential patterns by pattern-growth: The PrefixSpan approach [J].
Pei, J ;
Han, JW ;
Mortazavi-Asl, B ;
Wang, JY ;
Pinto, H ;
Chen, QM ;
Dayal, U ;
Hsu, MC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (11) :1424-1440
[23]   Skopus: Mining top-k sequential patterns under leverage [J].
Petitjean, Francois ;
Li, Tao ;
Tatti, Nikolaj ;
Webb, Geoffrey I. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) :1086-1111
[24]   Robust subgroup discovery Discovering subgroup lists using MDL [J].
Proenca, Hugo M. ;
Grunwald, Peter ;
Back, Thomas ;
van Leeuwen, Matthijs .
DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (05) :1885-1970
[25]   Interpretable multiclass classification by MDL-based rule lists [J].
Proenca, Hugo M. ;
van Leeuwen, Matthijs .
INFORMATION SCIENCES, 2020, 512 :1372-1393
[26]   MODELING BY SHORTEST DATA DESCRIPTION [J].
RISSANEN, J .
AUTOMATICA, 1978, 14 (05) :465-471
[27]   A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH [J].
RISSANEN, J .
ANNALS OF STATISTICS, 1983, 11 (02) :416-431
[28]   UNIVERSAL CODING, INFORMATION, PREDICTION, AND ESTIMATION [J].
RISSANEN, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1984, 30 (04) :629-636
[29]   ON THE NUMBER OF LABELED ACYCLIC DIGRAPHS [J].
RODIONOV, VI .
DISCRETE MATHEMATICS, 1992, 105 (1-3) :319-321
[30]  
Schoenig S, 2016, LECT NOTES COMPUT SC, V9936, P87, DOI 10.1007/978-3-319-46295-0_6