Improving Low-resource Named Entity Recognition with Graph Propagated Data Augmentation

被引:0
|
作者
Cai, Jiong [1 ]
Huang, Shen [3 ]
Jiang, Yong [3 ]
Tan, Zeqi [2 ]
Xie, Pengjun [3 ]
Tu, Kewei [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Informat Sci & Technol, Shanghai Engn Res Ctr Intelligent Vis & Imagin, Shanghai Inst Microsyst & Informat Technol,Shangh, Beijing, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[3] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data augmentation is an effective solution to improve model performance and robustness for low-resource named entity recognition (NER). However, synthetic data often suffer from poor diversity, which leads to performance limitations. In this paper, we propose a novel Graph Propagated Data Augmentation (GPDA) framework for Named Entity Recognition (NER), leveraging graph propagation to build relationships between labeled data and unlabeled natural texts. By projecting the annotations from the labeled text to the unlabeled text, the unlabeled texts are partially labeled, which has more diversity rather than synthetic annotated data. To strengthen the propagation precision, a simple search engine built on Wikipedia is utilized to fetch related texts of labeled data and to propagate the entity labels to them in the light of the anchor links. Besides, we construct and perform experiments on a real-world low-resource dataset of the E-commerce domain, which will be publicly available to facilitate the low-resource NER research. Experimental results show that GPDA presents substantial improvements over previous data augmentation methods on multiple low-resource NER datasets.
引用
收藏
页码:110 / 118
页数:9
相关论文
共 50 条
  • [21] Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
    Bartelds, Martijn
    San, Nay
    McDonnell, Bradley
    Jurafsky, Dan
    Wieling, Martijn
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 715 - 729
  • [22] A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
    Chen, Yuxuan
    Mikkelsen, Jonas
    Binder, Arne
    Alt, Christoph
    Hennig, Leonhard
    PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 46 - 59
  • [23] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [24] Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition
    Sohn, Hyunwoo
    Park, Baekkwan
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1616 - 1624
  • [25] Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter
    Dang, Xiaochao
    Wang, Li
    Dong, Xiaohui
    Li, Fenfang
    Deng, Han
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [26] Correction to: Novel data augmentation for named entity recognition
    Aluru V. N. M. Hemateja
    Gopikrishnan Kondakath
    Susruta Das
    Mohanaprasad Kothandaraman
    S. Shoba
    Abhishek Pandey
    Rajin Babu
    Abhinav Jain
    International Journal of Speech Technology, 2023, 26 (4) : 879 - 879
  • [27] Data Augmentation for Chinese Clinical Named Entity Recognition
    Wang P.-H.
    Li M.-Z.
    Li S.
    Li, Si (lisi@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (43): : 84 - 90
  • [28] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [29] MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER
    Zhou, Ran
    Li, Xin
    He, Ruidan
    Bing, Lidong
    Cambria, Erik
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2251 - 2262
  • [30] A multimodal approach for few-shot biomedical named entity recognition in low-resource languages
    Chen, Jian
    Su, Leilei
    Li, Yihong
    Lin, Mingquan
    Peng, Yifan
    Sun, Cong
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 161