Outcome-Oriented Predictive Process Monitoring on Positive and Unlabelled Event Logs

被引:1
作者
Peeperkorn, Jari [1 ]
Vazquez, Carlos Ortega [1 ]
Stevens, Alexander [1 ]
De Smedt, Johannes [1 ]
Vanden Broucke, Seppe [1 ,2 ]
De Weerdt, Jochen [1 ]
机构
[1] Katholieke Univ Leuven, Res Ctr Informat Syst Engn LIRIS, Leuven, Belgium
[2] Univ Ghent, Dept Business Informat & Operat Management, Ghent, Belgium
来源
PROCESS MINING WORKSHOPS, ICPM 2022 | 2023年 / 468卷
关键词
Process mining; Predictive process monitoring; OOPPM; XGBoost; LSTM; PU learning; Label uncertainty;
D O I
10.1007/978-3-031-27815-0_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A lot of recent literature on outcome-oriented predictive process monitoring focuses on using models from machine and deep learning. In this literature, it is assumed the outcome labels of the historical cases are all known. However, in some cases, the labelling of cases is incomplete or inaccurate. For instance, you might only observe negative customer feedback, fraudulent cases might remain unnoticed. These cases are typically present in the so-called positive and unlabelled (PU) setting, where your data set consists of a couple of positively labelled examples and examples which do not have a positive label, but might still be examples of a positive outcome. In this work, we show, using a selection of event logs from the literature, the negative impact of mislabelling cases as negative, more specifically when using XGBoost and LSTM neural networks. Furthermore, we show promising results on real-life datasets mitigating this effect, by changing the loss function used by a set of models during training to those of unbiased Positive-Unlabelled (uPU) or non-negative Positive-Unlabelled (nnPU) learning.
引用
收藏
页码:255 / 268
页数:14
相关论文
共 27 条
[1]   Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data [J].
Bekker, Jessa ;
Robberechts, Pieter ;
Davis, Jesse .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 :71-85
[2]   Learning from positive and unlabeled data: a survey [J].
Bekker, Jessa ;
Davis, Jesse .
MACHINE LEARNING, 2020, 109 (04) :719-760
[3]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[4]   Clustering-Based Predictive Process Monitoring [J].
Di Francescomarino, Chiara ;
Dumas, Marlon ;
Maggi, Fabrizio Maria ;
Teinemaa, Irene .
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2019, 12 (06) :896-909
[5]  
du Plessis MC, 2015, PR MACH LEARN RES, V37, P1386
[6]   Semi-Supervised Discovery of DNN-Based Outcome Predictors from Scarcely-Labeled Process Logs [J].
Folino, Francesco ;
Folino, Gianluigi ;
Guarascio, Massimo ;
Pontieri, Luigi .
BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2022, 64 (06) :729-749
[7]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8]   POSITIVE AND UNLABELED LEARNING ALGORITHMS AND APPLICATIONS: A SURVEY [J].
Jackie, Kristen ;
Spanias, Andreas .
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2019, :144-151
[9]  
Kiryo Ryuichi., 2017, ADV NEUR IN, V30
[10]   Machine Learning in Business Process Monitoring: A Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction [J].
Kratsch, Wolfgang ;
Manderscheid, Jonas ;
Roglinger, Maximilian ;
Seyfried, Johannes .
BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2021, 63 (03) :261-276