ALDANER: Active Learning based Data Augmentation for Named Entity Recognition

被引:1
|
作者
Moscato, Vincenzo [1 ]
Postiglione, Marco [1 ]
Sperli, Giancarlo [1 ]
Vignali, Andrea [1 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, Via Claudio 21, Naples, Italy
关键词
Data augmentation; Named Entity Recognition; Active Learning; RESOURCE;
D O I
10.1016/j.knosys.2024.112682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few- shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts
    Phan, Uyen T. P.
    Nguyen, Nhung T. H.
    PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 123 - 129
  • [22] RoPDA: Robust Prompt -Based Data Augmentation for Low -Resource Named Entity Recognition
    Song, Sihan
    Shen, Furao
    Zhao, Jian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19017 - 19025
  • [23] Data augmentation and transfer learning for cross-lingual Named Entity Recognition in the biomedical domain
    Lancheros, Brayan Stiven
    Pastor, Gloria Corpas
    Mitkov, Ruslan
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [24] EASAL: Entity-Aware Subsequence-Based Active Learning for Named Entity Recognition
    Liu, Yang
    Hu, Jinpeng
    Chen, Zhihong
    Wan, Xiang
    Chang, Tsung-Hui
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8897 - 8905
  • [25] Named entity recognition based on deep learning
    Ji Z.
    Kong D.
    Liu W.
    Dong W.
    Sang Y.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (06): : 1603 - 1615
  • [26] Label-Guided Data Augmentation for Chinese Named Entity Recognition
    Jiang, Miao
    Chen, Honghui
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [27] Weakly labeled data augmentation for social media named entity recognition
    Kim, Juae
    Kim, Yejin
    Kang, Sangwoo
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [28] A Method of Network Attack Named Entity Recognition based on Deep Active Learning
    Wang, Li
    Ma, Yunxiao
    Li, Mingyue
    Li, Hua
    Zhang, Peilong
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 376 - 387
  • [29] A Low-Cost Named Entity Recognition Research Based on Active Learning
    Huang, Han
    Wang, Hongyu
    Jin, Dawei
    SCIENTIFIC PROGRAMMING, 2018, 2018
  • [30] Widaug. Data augmentation for named entity recognition using Wikidata
    Calleja, Pablo
    Sanchez, Alberto
    Corcho, Oscar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 145 - 155