ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

被引:1
|
作者
Moscato, Vincenzo [1 ]
Postiglione, Marco [1 ]
Sperli, Giancarlo [1 ]
Vignali, Andrea [1 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, Via Claudio 21, Naples, Italy
关键词
Data augmentation; Named Entity Recognition; Active Learning; RESOURCE;
D O I
10.1016/j.knosys.2024.112682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few- shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition
    Li, Qingqing
    Huang, Zhen
    Dou, Yong
    Zhang, Ziwen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 88 - 100
  • [2] Novel data augmentation for named entity recognition
    Hemateja A.V.N.M.
    Kondakath G.
    Das S.
    Kothandaraman M.
    Shoba S.
    Pandey A.
    Babu R.
    Jain A.
    International Journal of Speech Technology, 2023, 26 (4) : 869 - 878
  • [3] Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks
    Hu, Xuming
    Jiang, Yong
    Liu, Aiwei
    Huang, Zhongqiang
    Xie, Pengjun
    Huang, Fei
    Wen, Lijie
    Yu, Philip S.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9072 - 9087
  • [4] Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition
    Liao, Xingming
    Lin, Nankai
    Li, Haowen
    Cheng, Lianglun
    Wang, Zhuowei
    Chen, Chong
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 618 - 623
  • [5] Correction to: Novel data augmentation for named entity recognition
    Aluru V. N. M. Hemateja
    Gopikrishnan Kondakath
    Susruta Das
    Mohanaprasad Kothandaraman
    S. Shoba
    Abhishek Pandey
    Rajin Babu
    Abhinav Jain
    International Journal of Speech Technology, 2023, 26 (4) : 879 - 879
  • [6] Data Augmentation for Chinese Clinical Named Entity Recognition
    Wang P.-H.
    Li M.-Z.
    Li S.
    Li, Si (lisi@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (43): : 84 - 90
  • [7] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [8] Loss-based Active Learning for Named Entity Recognition
    Linh, Le Thai
    Nguyen, Minh-Tien
    Zuccon, Guido
    Demartini, Gianluca
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Subsequence Based Deep Active Learning for Named Entity Recognition
    Radmard, Puria
    Fathullah, Yassir
    Lipani, Aldo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4310 - 4321
  • [10] Clustering Based Active Learning for Biomedical Named Entity Recognition
    Han, Xu
    Kwoh, Chee Keong
    Kim, Jung-jae
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1253 - 1260