BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER

被引:2
作者
Ghosh, Sreyan [1 ]
Tyagi, Utkarsh [1 ]
Kumar, Sonal [1 ]
Manocha, Dinesh [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年
关键词
Named Entity Recognition; Information Extraction; Biomedical; NAMED ENTITY RECOGNITION;
D O I
10.1145/3539618.3591957
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation techniques fail to produce factual and diverse augmentations for BioNER. In this paper, we present BioAug, a novel data augmentation framework for low-resource BioNER. BioAug, built on BART, is trained to solve a novel text reconstruction task based on selective masking and knowledge augmentation. Post training, we perform conditional generation and generate diverse augmentations conditioning BioAug on selectively corrupted text similar to the training stage. We demonstrate the effectiveness of BioAug on 5 benchmark BioNER datasets and show that BioAug outperforms all our baselines by a significant margin (1.5%-21.5% absolute improvement) and is able to generate augmentations that are both more factual and diverse. Code: https://github.com/Sreyan88/BioAug.
引用
收藏
页码:1853 / 1858
页数:6
相关论文
共 14 条
  • [11] Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition
    Sohn, Hyunwoo
    Park, Baekkwan
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1616 - 1624
  • [12] A Knowledge-Based Data Augmentation Framework for Few-Shot Biomedical Information Extraction
    Su, Xin
    Cheng, Chuang
    Yang, Kuo
    Zhou, Xuezhong
    HEALTH INFORMATION PROCESSING. EVALUATION TRACK PAPERS, 2023, 1773 : 29 - 40
  • [13] AI-Based Assistance for Management of Oral Community Knowledge in Low-Resource and Colloquial Kannada Language
    Aparna, M.
    Srivatsa, Sharath
    Madhavan, G. Sai
    Dinesh, T. B.
    Srinivasa, Srinath
    BIG DATA ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2023, 2024, 14516 : 3 - 16
  • [14] Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction
    Zheng, Yandan
    Tuan, Luu Anh
    COGNITIVE COMPUTATION, 2024, 16 (06) : 3228 - 3240