ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

被引:1
|
作者
Moscato, Vincenzo [1 ]
Postiglione, Marco [1 ]
Sperli, Giancarlo [1 ]
Vignali, Andrea [1 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, Via Claudio 21, Naples, Italy
关键词
Data augmentation; Named Entity Recognition; Active Learning; RESOURCE;
D O I
10.1016/j.knosys.2024.112682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few- shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Combining self learning and active learning for Chinese Named Entity Recognition
    Yao L.
    Sun C.
    Wang X.
    Wang X.
    Journal of Software, 2010, 5 (05) : 530 - 537
  • [32] Ensemble based Active Annotation for Named Entity Recognition
    Ekbal, Asif
    Saha, Sriparna
    Singh, Dhirendra
    2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 331 - 334
  • [33] Named entity recognition using point prediction and active learning
    Kobayashi, Koga
    Wakabayashi, Kei
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 287 - 293
  • [34] Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition
    Wang, Moyao
    Gao, Hui
    Zhang, Peng
    Zhang, Jing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 451 - 462
  • [35] A named entity recognition model based on ensemble learning
    Zhu, Xinghui
    Zou, Zhuoyang
    Qiao, Bo
    Fang, Kui
    Chen, Yiming
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2021, 21 (02) : 475 - 486
  • [36] Bagging-Based Active Learning Model for Named Entity Recognition with Distant Supervision
    Lee, Sunghee
    Song, Yeongkil
    Choi, Maengsik
    Kim, Harksoo
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 321 - 324
  • [37] LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition
    Liu, Mingyi
    Tu, Zhiying
    Zhang, Tong
    Su, Tonghua
    Xu, Xiaofei
    Wang, Zhongjie
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 2433 - 2454
  • [38] Active Learning-Based Approach for Named Entity Recognition on Short Text Streams
    Cuong Van Tran
    Tuong Tri Nguyen
    Dinh Tuyen Hoang
    Hwang, Dosam
    Ngoc Thanh Nguyen
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 321 - 330
  • [39] A Named Entity Recognition Model Based on Entity Trigger Reinforcement Learning
    Wang, Ping
    Si, Nong
    Tong, Haopeng
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 43 - 48
  • [40] On active annotation for named entity recognition
    Ekbal, Asif
    Saha, Sriparna
    Sikdar, Utpal Kumar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (04) : 623 - 640