ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

被引:1
|
作者
Moscato, Vincenzo [1 ]
Postiglione, Marco [1 ]
Sperli, Giancarlo [1 ]
Vignali, Andrea [1 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, Via Claudio 21, Naples, Italy
关键词
Data augmentation; Named Entity Recognition; Active Learning; RESOURCE;
D O I
10.1016/j.knosys.2024.112682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few- shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] On active annotation for named entity recognition
    Asif Ekbal
    Sriparna Saha
    Utpal Kumar Sikdar
    International Journal of Machine Learning and Cybernetics, 2016, 7 : 623 - 640
  • [42] EPT: Data Augmentation with Embedded Prompt Tuning for LowResource Named Entity Recognition
    YU Hongfei
    NI Kunyu
    XU Rongkang
    YU Wenjun
    HUANG Yu
    Wuhan University Journal of Natural Sciences, 2023, 28 (04) : 299 - 308
  • [43] Evaluation on Network Social Media Named Entity Recognition Model Based on Active Learning
    He, Guijiao
    Zhou, Yunfeng
    Zheng, Yaodong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
  • [44] Named entity recognition based on a machine learning model
    Wang, Jing
    Liu, Zhijing
    Zhao, Hui
    Research Journal of Applied Sciences, Engineering and Technology, 2012, 4 (20) : 3973 - 3980
  • [45] LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition
    Mingyi Liu
    Zhiying Tu
    Tong Zhang
    Tonghua Su
    Xiaofei Xu
    Zhongjie Wang
    Neural Processing Letters, 2022, 54 : 2433 - 2454
  • [46] An improved data augmentation approach and its application in medical named entity recognition
    Chen, Hongyu
    Dan, Li
    Lu, Yonghe
    Chen, Minghong
    Zhang, Jinxia
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [47] Data augmentation via context similarity: An application to biomedical Named Entity Recognition
    Bartolini, Ilaria
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    INFORMATION SYSTEMS, 2023, 119
  • [48] Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
    Liang, Zheng
    Song, Zheshu
    Ma, Ziyang
    Du, Chenpeng
    Yu, Kai
    Chen, Xie
    INTERSPEECH 2023, 2023, : 919 - 923
  • [49] A study of active learning methods for named entity recognition in clinical text
    Chen, Yukun
    Lasko, Thomas A.
    Mei, Qiaozhu
    Denny, Joshua C.
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 11 - 18
  • [50] The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy
    Xie, Bo
    Shen, Guowei
    Guo, Chun
    Cui, Yunhe
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021