The Impact of Event Log Subset Selection on the Performance of Process Discovery Algorithms

被引:13
作者
Sani, Mohammadreza Fani [1 ]
van Zelst, Sebastiaan J. [1 ,2 ]
van der Aalst, Wil M. P. [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Proc & Data Sci Chair, Aachen, Germany
[2] Fraunhofer FIT, St Augustin, Germany
来源
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019 | 2019年 / 1064卷
关键词
Process mining; Process discovery; Subset selection; Event log preprocessing; Performance enhancement;
D O I
10.1007/978-3-030-30278-8_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Process discovery algorithms automatically discover process models on the basis of event data, captured during the execution of business processes. These algorithms tend to use all of the event data to discover a process model. When dealing with large event logs, it is no longer feasible using standard hardware in limited time. A straightforward approach to overcome this problem is to down-size the event data by means of sampling. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper evaluates various subset selection methods and evaluates their performance on real event data. The proposed methods have been implemented in both the ProM and the RapidProM platforms. Our experiments show that it is possible to speed up discovery considerably using ranking-based strategies. Furthermore, results show that biased selection of the process instances compared to random selection of them will result in process models with higher quality.
引用
收藏
页码:391 / 404
页数:14
相关论文
共 22 条
  • [1] Andrews R., 2018, OTM C
  • [2] Augusto A., 2019, KNOWL INF SYST, V50, P1
  • [3] How Much Event Data Is Enough? A Statistical Framework for Process Discovery
    Bauer, Martin
    Senderovich, Arik
    Gal, Avigdor
    Grunske, Lars
    Weidlich, Matthias
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2018, 2018, 10816 : 239 - 256
  • [4] Berti A, 2017, NINTH INTERNATIONAL CONFERENCE ON INFORMATION, PROCESS, AND KNOWLEDGE MANAGEMENT (EKNOW 2017), P41
  • [5] Filtering Out Infrequent Behavior from Business Process Event Logs
    Conforti, Raffaele
    La Rosa, Marcello
    ter Hofstede, Arthur H. M.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (02) : 300 - 314
  • [6] De Weerdt J., 2011, Proceedings 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), P148, DOI 10.1109/CIDM.2011.5949428
  • [7] Leemans S.J., 2013, P INT C BUS PROC MAN, P66, DOI [10.1007/978-3-319-06257-0_6, DOI 10.1007/978-3-319-06257-0]
  • [8] Leemans Sander J. J., 2013, Application and Theory of Petri Nets and Concurrency. 34th International Conference, PETRI NETS 2013. Proceedings: LNCS 7927, P311, DOI 10.1007/978-3-642-38697-8_17
  • [9] Data-Driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs
    Mannhardt, Felix
    de Leoni, Massimiliano
    Reijers, Hajo A.
    van der Aalst, Wil M. P.
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2017), 2017, 10253 : 545 - 560
  • [10] Repairing Outlier Behaviour in Event Logs
    Sani, Mohammadreza Fani
    van Zelst, Sebastiaan J.
    van der Aalst, Wil M. P.
    [J]. BUSINESS INFORMATION SYSTEMS (BIS 2018), 2018, 320 : 115 - 131