An Empirical Investigation of Different Classifiers, Encoding, and Ensemble Schemes for Next Event Prediction Using Business Process Event Logs

被引：7

作者：

Tama, Bayu Adhi ^{[1
,3
]}

Comuzzi, Marco ^{[2
,4
]}

Ko, Jonghyeon ^{[2
,4
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Dept Mech Engn, Pohang, South Korea

[2] Ulsan Natl Inst Sci & Technol UNIST, Sch Management Engn, Ulsan, South Korea

[3] Inst Basic Sci IBS, Ctr Math & Computat Sci, Data Sci Grp, 55 Expo Ro, Daejeon 34126, South Korea

[4] Ulsan Natl Inst Sci & Technol UNIST, Dept Ind Engn, 50 UNIST Gil, Ulsan 44919, South Korea

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2020年 / 11卷 / 06期

关键词：

Classifier ensembles; individual classifier; business process; predictive monitoring; empirical benchmark; homogeneous ensembles; next event prediction; INTRUSION DETECTION; BEHAVIOR; DRIVEN; TIME;

D O I：

10.1145/3406541

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There is a growing need for empirical benchmarks that support researchers and practitioners in selecting the best machine learning technique for given prediction tasks. In this article, we consider the next event prediction task in business process predictive monitoring, and we extend our previously published benchmark by studying the impact on the performance of different encoding windows and of using ensemble schemes. The choice of whether to use ensembles and which scheme to use often depends on the type of data and classification task. While there is a general understanding that ensembles perform well in predictive monitoring of business processes, next event prediction is a task for which no other benchmarks involving ensembles are available. The proposed benchmark helps researchers to select a high-performing individual classifier or ensemble scheme given the variability at the case level of the event log under consideration. Experimental results show that choosing an optimal number of events for feature encoding is challenging, resulting in the need to consider each event log individually when selecting an optimal value. Ensemble schemes improve the performance of low-performing classifiers in this task, such as SVM, whereas high-performing classifiers, such as tree-based classifiers, are not better off when ensemble schemes are considered.

引用

页数：34

共 71 条

[1] A comparative study on base classifiers in ensemble methods for credit scoring
Abelian, Joaquin
Castellano, Javier G.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 : 1 - 10
[2] Building classification trees using the total uncertainty criterion
Abellán, J
Moral, S
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2003, 18 (12) : 1215 - 1225
[3] Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring
Abellan, Joaquin
Mantas, Carlos J.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (08) : 3825 - 3830
[4] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[5] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
ALTMAN, NS
[J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
[6] [Anonymous], 2015, P 16 INT C BPMDS 20
[7] [Anonymous], 2014, INT C BUSINESS PROCE
[8] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[9] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[10] Cabanillas C, 2014, LECT NOTES COMPUT SC, V8659, P424, DOI 10.1007/978-3-319-10172-9_31

← 1 2 3 4 5 6 7 8 →