Scalable process discovery and conformance checking

被引:149
作者
Leemans, Sander J. J. [1 ]
Fahland, Dirk [1 ]
Van der Aalst, Wil M. P. [1 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
关键词
Big data; Scalable process mining; Block-structured process discovery; Directly; follows graphs; Algorithm evaluation; Rediscoverability; Conformance checking; PROCESS MODELS; PETRI NETS;
D O I
10.1007/s10270-016-0545-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model-model and model-log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.
引用
收藏
页码:599 / 631
页数:33
相关论文
共 66 条
[1]   Conformance Checking using Cost-Based Fitness Analysis [J].
Adriansyah, A. ;
van Dongen, B. F. ;
van der Aalst, W. M. P. .
15TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2011), 2011, :55-64
[2]  
Adriansyah A, 2014, THESIS EINDHOVEN U T
[3]  
Adriansyah A, 2013, LECT NOTES BUS INF P, V132, P137
[4]   Mining specifications [J].
Ammons, G ;
Bodík, R ;
Larus, JR .
ACM SIGPLAN NOTICES, 2002, 37 (01) :4-16
[5]  
[Anonymous], P 15 INT C KNOWL TEC
[6]  
ARMASCERVANTES A, 2014, BPM, V8659, P267, DOI DOI 10.1007/978-3-319-10172-9_
[7]  
Badouel E., 2012, LNCS, V7347, P128, DOI [10.1007/978-3-642-31131-48, DOI 10.1007/978-3-642-31131-48]
[8]   A comparative survey of business process similarity measures [J].
Becker, Michael ;
Laue, Ralf .
COMPUTERS IN INDUSTRY, 2012, 63 (02) :148-167
[9]  
Bergenthum R., 2007, PROCESS MINING BASED
[10]   Synthesis of Petri Nets from Term Based Representations of Infinite Partial Languages [J].
Bergenthum, Robin ;
Desel, Joerg ;
Mauser, Sebastian ;
Lorenz, Robert .
FUNDAMENTA INFORMATICAE, 2009, 95 (01) :187-217