Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies

被引:3
作者
Berti, Alessandro [1 ,2 ]
Kourani, Humam [1 ,2 ]
Haefke, Hannes [1 ]
Li, Chiao-Yun [1 ,2 ]
Schuster, Daniel [1 ,2 ]
机构
[1] Fraunhofer FIT, St Augustin, Germany
[2] Rhein Westfal TH Aachen, Proc & Data Sci Chair, Aachen, Germany
来源
ENTERPRISE, BUSINESS-PROCESS AND INFORMATION SYSTEMS MODELING, BPMDS 2024, EMMSAD 2024 | 2024年 / 511卷
关键词
Large Language Models (LLMs); Output Evaluation; Benchmarking Strategies;
D O I
10.1007/978-3-031-61007-3_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using Large Language Models (LLMs) for Process Mining (PM) tasks is becoming increasingly essential, and initial approaches yield promising results. However, little attention has been given to developing strategies for evaluating and benchmarking the utility of incorporating LLMs into PM tasks. This paper reviews the current implementations of LLMs in PM and reflects on three different questions. 1) What is the minimal set of capabilities required for PM on LLMs? 2) Which benchmark strategies help choose optimal LLMs for PM? 3) How do we evaluate the output of LLMs on specific PM tasks? The answer to these questions is fundamental to the development of comprehensive process mining benchmarks on LLMs covering different tasks and implementation paradigms.
引用
收藏
页码:13 / 21
页数:9
相关论文
共 38 条
[1]  
Bang Y, 2023, Arxiv, DOI [arXiv:2302.04023, 10.48550/arXiv.2302.04023]
[2]  
Berti A., 2023, arXiv, DOI [10.48550/arXiv.2307.02194, DOI 10.48550/ARXIV.2307.02194]
[3]  
Berti A, 2023, Arxiv, DOI [arXiv:2307.12701, 10.48550/arXiv.2307.12701, DOI 10.48550/ARXIV.2307.12701]
[4]  
Chang YP, 2023, Arxiv, DOI [arXiv:2307.03109, DOI 10.48550/ARXIV.2307.03109]
[5]  
Dong ZC, 2024, Arxiv, DOI arXiv:2309.13345
[6]   Large Language Models Can Accomplish Business Process Management Tasks [J].
Grohs, Michael ;
Abb, Luka ;
Elsayed, Nourhan ;
Rehse, Jana-Rebecca .
BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2023, 2024, 492 :453-465
[7]  
Gu ZH, 2024, Arxiv, DOI arXiv:2306.05783
[8]  
Harer F., 2023, ER 2023, V3618
[9]  
Hendrycks Dan, 2021, NeurIPS Datasets and Benchmarks Track
[10]  
Jessen U, 2023, Arxiv, DOI arXiv:2307.09909