Leveraging Data Augmentation for Process Information Extraction

被引:1
|
作者
Neuberger, Julian [1 ]
Doll, Leonie [1 ]
Engelmann, Benedikt [1 ]
Ackermann, Lars [1 ]
Jablonski, Stefan [1 ]
机构
[1] Univ Bayreuth, Bayreuth, Germany
关键词
Business Process Extraction; Data Augmentation; Natural Language Processing;
D O I
10.1007/978-3-031-61007-3_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation of process models from readily available data. These include process mining on event logs and generating business process models from natural language texts. Research in the latter field is regularly faced with the problem of limited data availability, hindering both evaluation and development of new techniques, especially learning-based ones. To overcome this data scarcity issue, in this paper we investigate the application of data augmentation for natural language text data. Data augmentation methods are well established in machine learning for creating new, synthetic data without human assistance. We find that many of these methods are applicable to the task of business process information extraction, improving the accuracy of extraction. Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art. Simple data augmentation techniques improved the F-1 score of mention extraction by 2.9% points, and the F-1 of relation extraction by 4.5. To better understand how data augmentation alters human annotated texts, we analyze the resulting text, visualizing and discussing the properties of augmented textual data. We make all code and experiments results publicly available (Code for our framework can be found at https://github.com/JulianNeuberger/ pet- data-augmentation, detailed results for our experiments as MySQL dump can be downloaded from https://zenodo.org/doi/10.5281/zenodo. 10941423.).
引用
收藏
页码:57 / 70
页数:14
相关论文
共 50 条
  • [1] Data Augmentation and Preparation Process of PerInfEx: A Persian Chatbot With the Ability of Information Extraction
    Safari, Pegah
    Shamsfard, Mehrnoush
    IEEE ACCESS, 2024, 12 : 19158 - 19180
  • [2] Leveraging linked open data information extraction for data mining applications
    Mahule, Rajesh
    Vyas, Om Prakash
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (06) : 4874 - 4884
  • [3] Fault diagnosis strategy for few shot industrial process based on data augmentation and depth information extraction
    Tian, Ying
    Xiang, Xin
    Peng, Xin
    Yin, Zhong
    Zhang, Wei
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2023, 101 (08): : 4620 - 4639
  • [4] CoRI: Collective Relation Integration with Data Augmentation for Open Information Extraction
    Jiang, Zhengbao
    Han, Jialong
    Sisman, Bunyamin
    Dong, Xin Luna
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4706 - 4716
  • [5] Leveraging Prompt and Top-K Predictions with ChatGPT Data Augmentation for Improved Relation Extraction
    Feng, Ping
    Wu, Hang
    Yang, Ziqian
    Wang, Yunyi
    Ouyang, Dantong
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [6] Leveraging LLMs for Information Extraction in Manufacturing
    Matthes, Marvin
    Guhr, Oliver
    Krockert, Martin
    Munkelt, Torsten
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS-PRODUCTION MANAGEMENT SYSTEMS FOR VOLATILE, UNCERTAIN, COMPLEX, AND AMBIGUOUS ENVIRONMENTS, APMS 2024, PT V, 2024, 732 : 355 - 366
  • [7] Towards TotalSegmentator for MRI data leveraging GIN data augmentation
    Geissler, Kai
    Mensing, Daniel
    Wenzel, Markus
    Hirsch, Jochen G.
    Heldmann, Stefan
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [8] Web Information Extraction for content augmentation
    Janevski, A
    Dimitrova, N
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A389 - A392
  • [9] Table Information Extraction Using Data Augmentation on Deep Learning and Image Processing
    Zulkarnain, Izuardo
    Nurmalasari, Rin Rin
    Azizah, Fazat Nur
    Proceeding of 2022 16th International Conference on Telecommunication Systems Services and Applications, TSSA 2022, 2022,
  • [10] Improving Event Duration Question Answering by Leveraging Existing Temporal Information Extraction Data
    Virgo, Felix Giovanni
    Cheng, Fei
    Kurohashi, Sadao
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4451 - 4457