Leveraging Data Augmentation for Process Information Extraction

被引:1
|
作者
Neuberger, Julian [1 ]
Doll, Leonie [1 ]
Engelmann, Benedikt [1 ]
Ackermann, Lars [1 ]
Jablonski, Stefan [1 ]
机构
[1] Univ Bayreuth, Bayreuth, Germany
关键词
Business Process Extraction; Data Augmentation; Natural Language Processing;
D O I
10.1007/978-3-031-61007-3_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation of process models from readily available data. These include process mining on event logs and generating business process models from natural language texts. Research in the latter field is regularly faced with the problem of limited data availability, hindering both evaluation and development of new techniques, especially learning-based ones. To overcome this data scarcity issue, in this paper we investigate the application of data augmentation for natural language text data. Data augmentation methods are well established in machine learning for creating new, synthetic data without human assistance. We find that many of these methods are applicable to the task of business process information extraction, improving the accuracy of extraction. Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art. Simple data augmentation techniques improved the F-1 score of mention extraction by 2.9% points, and the F-1 of relation extraction by 4.5. To better understand how data augmentation alters human annotated texts, we analyze the resulting text, visualizing and discussing the properties of augmented textual data. We make all code and experiments results publicly available (Code for our framework can be found at https://github.com/JulianNeuberger/ pet- data-augmentation, detailed results for our experiments as MySQL dump can be downloaded from https://zenodo.org/doi/10.5281/zenodo. 10941423.).
引用
收藏
页码:57 / 70
页数:14
相关论文
共 50 条
  • [21] Leveraging Artificial Occluded Samples for Data Augmentation in Human Activity Recognition
    Mathe, Eirini
    Vernikos, Ioannis
    Spyrou, Evaggelos
    Mylonas, Phivos
    SENSORS, 2025, 25 (04)
  • [22] Leveraging Question Answering for Domain-Agnostic Information Extraction
    Luis Ferreira, Bruno Carlos
    Oliveira, Hugo Goncalo
    Silva, Catarina
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 244 - 256
  • [23] Pre-trained models, data augmentation, and ensemble learning for biomedical information extraction and document classification
    Erdengasileng, Arslan
    Han, Qing
    Zhao, Tingting
    Tian, Shubo
    Sui, Xin
    Li, Keqiao
    Wang, Wanjing
    Wang, Jian
    Hu, Ting
    Pan, Feng
    Zhang, Yuan
    Zhang, Jinfeng
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2022, 2022
  • [24] Leveraging Event Data for Measuring Process Complexity
    Vidgof, Maxim
    Mendling, Jan
    PROCESS MINING WORKSHOPS, ICPM 2022, 2023, 468 : 84 - 95
  • [25] Leveraging Data for Better Biopharmaceutical Process Control
    Shanley, Agnes
    BIOPHARM INTERNATIONAL, 2018, 31 (05) : 42 - 45
  • [26] ON THE AUGMENTATION OF RING RECOVERY DATA WITH FIELD INFORMATION
    FREEMAN, SN
    MORGAN, BJT
    CATCHPOLE, EA
    JOURNAL OF ANIMAL ECOLOGY, 1992, 61 (03) : 649 - 657
  • [27] Media augmentation and personalization through multimedia processing and information extraction
    Dimitrova, N
    Zimmerman, J
    Janevski, A
    Agnihotri, L
    Haas, N
    Li, DG
    Bolle, R
    Velipasalar, S
    McGee, T
    Nikolovska, L
    PERSONALIZED DIGITAL TELEVISION: TARGETING PROGRAMS TO INDIVIDUAL VIEWERS, 2004, : 203 - 233
  • [28] EEG Feature Extraction and Data Augmentation in Emotion Recognition
    Kalashami, Mahsa Pourhosein
    Pedram, Mir Mohsen
    Sadr, Hossein
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [29] Significance extraction based on data augmentation for reinforcement learning
    Han, Yuxi
    Li, Dequan
    Yang, Yang
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025, : 385 - 399
  • [30] Long text feature extraction network with data augmentation
    Tang, Changhao
    Ma, Kun
    Cui, Benkuan
    Ji, Ke
    Abraham, Ajith
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17652 - 17667