Sampling business process event logs with guarantees

被引:0
作者
Su, Xuan [1 ]
Liu, Cong [1 ,2 ,4 ]
Zhang, Shuaipeng [3 ]
Zeng, Qingtian [2 ]
机构
[1] Shandong Univ Technol, Sch Comp Sci & Technol, Zibo, Peoples R China
[2] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[3] Shandong Univ, Sch Software, Jinan, Peoples R China
[4] Shandong Univ Technol, Sch Comp Sci & Technol, Zibo 255000, Peoples R China
关键词
process mining; model discovery; event log sampling; behavior equivalence; efficiency; PROCESS MODELS; DISCOVERY;
D O I
10.1002/cpe.8077
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Event log sampling has emerged as a key research focus in the field of process mining, aiming to enhance the efficiency of various process mining tasks, including model discovery, conformance checking, and process prediction. However, current log sampling techniques often fail to ensure high-quality sample logs. This paper introduces a novel framework to support efficient event log sampling without compromising the quality of the sample log compared to the original one. The approach revolves around the consideration of directly-follows relation (DFR) among business tasks as the fundamental behavior unit of an event log. By ensuring the DFR equivalence between the original and sample logs, the proposed technique addresses the challenge of sample log quality from the model discovery point of view. The framework is instantiated by seven distinct sampling strategies each has its own specialty and is fully implemented in the open-source process mining tool platform ProM. To validate its effectiveness, we conducted a comprehensive experimental evaluation using 12 publicly available real-life event logs against state-of-the-art sampling techniques. The results clearly demonstrate that our technique significantly improves model discovery efficiency while upholding high quality of the discovered models.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Efficiently interpreting traces of low level events in business process logs
    Fazzinga, Bettina
    Flesca, Sergio
    Furfaro, Filippo
    Masciari, Elio
    Pontieri, Luigi
    INFORMATION SYSTEMS, 2018, 73 : 1 - 24
  • [42] ProcessChain: a blockchain-based framework for privacy preserving cross-organizational business process mining from distributed event logs
    Singh, Sandeep Kumar
    Jenamani, Mamata
    BUSINESS PROCESS MANAGEMENT JOURNAL, 2024, 30 (01) : 239 - 269
  • [43] Process Discovery for Event Logs with Multi-Occurrence Event Types
    Kovacs, Laszlo
    Jlidi, Ali
    ALGORITHMS, 2025, 18 (02)
  • [44] Differentially private release of event logs for process mining
    Elkoumy, Gamal
    Pankova, Alisa
    Dumas, Marlon
    INFORMATION SYSTEMS, 2023, 115
  • [45] Configurable Process Mining: Semantic Variability in Event Logs
    Khannat, Aicha
    Sbai, Hanae
    Kjiri, Laila
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 768 - 775
  • [46] On process model synthesis based on event logs with noise
    Mitsyuk A.A.
    Shugurov I.S.
    Automatic Control and Computer Sciences, 2016, 50 (7) : 460 - 470
  • [47] Using Event Logs for Local Correction of Process Models
    Mitsyuk A.A.
    Lomazova I.A.
    van der Aalst W.M.P.
    Automatic Control and Computer Sciences, 2017, 51 (7) : 709 - 723
  • [48] Sequence partitioning for process mining with unlabeled event logs
    Walicki, Michal
    Ferreira, Diogo R.
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (10) : 821 - 841
  • [49] Repairing Event Logs Using Timed Process Models
    Rogge-Solti, Andreas
    Mans, Ronny S.
    van der Aalst, Wil M. P.
    Weske, Mathias
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 WORKSHOPS, 2013, 8186 : 705 - 708
  • [50] Process Mining of Event Logs from Horde Helpdesk
    Dolak, Radim
    Botlik, Josef
    SMART TECHNOLOGIES AND INNOVATION FOR A SUSTAINABLE FUTURE, 2019, : 303 - 309