Leveraging Data Augmentation for Process Information Extraction

被引：1

作者：

Neuberger, Julian ^{[1
]}

Doll, Leonie ^{[1
]}

Engelmann, Benedikt ^{[1
]}

Ackermann, Lars ^{[1
]}

Jablonski, Stefan ^{[1
]}

机构：

[1] Univ Bayreuth, Bayreuth, Germany

来源：

ENTERPRISE, BUSINESS-PROCESS AND INFORMATION SYSTEMS MODELING, BPMDS 2024, EMMSAD 2024 | 2024年 / 511卷

关键词：

Business Process Extraction; Data Augmentation; Natural Language Processing;

D O I：

10.1007/978-3-031-61007-3_6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation of process models from readily available data. These include process mining on event logs and generating business process models from natural language texts. Research in the latter field is regularly faced with the problem of limited data availability, hindering both evaluation and development of new techniques, especially learning-based ones. To overcome this data scarcity issue, in this paper we investigate the application of data augmentation for natural language text data. Data augmentation methods are well established in machine learning for creating new, synthetic data without human assistance. We find that many of these methods are applicable to the task of business process information extraction, improving the accuracy of extraction. Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art. Simple data augmentation techniques improved the F-1 score of mention extraction by 2.9% points, and the F-1 of relation extraction by 4.5. To better understand how data augmentation alters human annotated texts, we analyze the resulting text, visualizing and discussing the properties of augmented textual data. We make all code and experiments results publicly available (Code for our framework can be found at https://github.com/JulianNeuberger/ pet- data-augmentation, detailed results for our experiments as MySQL dump can be downloaded from https://zenodo.org/doi/10.5281/zenodo. 10941423.).

引用

页码：57 / 70

页数：14

共 50 条

[21] Leveraging Artificial Occluded Samples for Data Augmentation in Human Activity Recognition
Mathe, Eirini
Vernikos, Ioannis
Spyrou, Evaggelos
Mylonas, Phivos
SENSORS, 2025, 25 (04)
[22] Leveraging Question Answering for Domain-Agnostic Information Extraction
Luis Ferreira, Bruno Carlos
Oliveira, Hugo Goncalo
Silva, Catarina
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 244 - 256
[23] Pre-trained models, data augmentation, and ensemble learning for biomedical information extraction and document classification
Erdengasileng, Arslan
Han, Qing
Zhao, Tingting
Tian, Shubo
Sui, Xin
Li, Keqiao
Wang, Wanjing
Wang, Jian
Hu, Ting
Pan, Feng
Zhang, Yuan
Zhang, Jinfeng
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2022, 2022
[24] Leveraging Event Data for Measuring Process Complexity
Vidgof, Maxim
Mendling, Jan
PROCESS MINING WORKSHOPS, ICPM 2022, 2023, 468 : 84 - 95
[25] Leveraging Data for Better Biopharmaceutical Process Control
Shanley, Agnes
BIOPHARM INTERNATIONAL, 2018, 31 (05) : 42 - 45
[26] ON THE AUGMENTATION OF RING RECOVERY DATA WITH FIELD INFORMATION
FREEMAN, SN
MORGAN, BJT
CATCHPOLE, EA
JOURNAL OF ANIMAL ECOLOGY, 1992, 61 (03) : 649 - 657
[27] Media augmentation and personalization through multimedia processing and information extraction
Dimitrova, N
Zimmerman, J
Janevski, A
Agnihotri, L
Haas, N
Li, DG
Bolle, R
Velipasalar, S
McGee, T
Nikolovska, L
PERSONALIZED DIGITAL TELEVISION: TARGETING PROGRAMS TO INDIVIDUAL VIEWERS, 2004, : 203 - 233
[28] EEG Feature Extraction and Data Augmentation in Emotion Recognition
Kalashami, Mahsa Pourhosein
Pedram, Mir Mohsen
Sadr, Hossein
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[29] Significance extraction based on data augmentation for reinforcement learning
Han, Yuxi
Li, Dequan
Yang, Yang
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025, : 385 - 399
[30] Long text feature extraction network with data augmentation
Tang, Changhao
Ma, Kun
Cui, Benkuan
Ji, Ke
Abraham, Ajith
APPLIED INTELLIGENCE, 2022, 52 (15) : 17652 - 17667

← 1 2 3 4 5 →