Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

被引:0
作者
Meng, Yu [1 ]
Zhang, Yunyi [1 ]
Huang, Jiaxin [1 ]
Wang, Xuan [1 ]
Zhang, Yu [1 ]
Ji, Heng [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantlysupervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantlysupervised NER models by significant margins(1).
引用
收藏
页码:10367 / 10378
页数:12
相关论文
共 50 条
[21]   A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION [J].
Zhu, Qiu-Shi ;
Zhang, Jie ;
Zhang, Zi-Qiang ;
Wu, Ming-Hui ;
Fang, Xin ;
Dai, Li-Rong .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :3174-3178
[22]   Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning [J].
Zhou, Ran ;
Li, Xin ;
Bing, Lidong ;
Cambria, Erik ;
Miao, Chunyan .
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, :4018-4031
[23]   Improving Distantly-Supervised Named Entity Recognition for Traditional Chinese Medicine Text via a Novel Back-Labeling Approach [J].
Zhang, Dezheng ;
Xia, Chao ;
Xu, Cong ;
Jia, Qi ;
Yang, Shibing ;
Luo, Xiong ;
Xie, Yonghong .
IEEE ACCESS, 2020, 8 :145413-145421
[24]   Self-Training With Double Selectors for Low-Resource Named Entity Recognition [J].
Fu, Yingwen ;
Lin, Nankai ;
Yu, Xiaohui ;
Jiang, Shengyi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 :1265-1275
[25]   Robust Semi-Supervised Traffic Sign Recognition via Self-Training and Weakly-Supervised Learning [J].
Nartey, Obed Tettey ;
Yang, Guowu ;
Asare, Sarpong Kwadwo ;
Wu, Jinzhao ;
Frempong, Lady Nadia .
SENSORS, 2020, 20 (09)
[26]   A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition [J].
Zhu, Qiu-Shi ;
Zhang, Jie ;
Zhang, Zi-Qiang ;
Dai, Li-Rong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 :1927-1939
[27]   Distantly Supervised Biomedical Relation Extraction via Negative Learning and Noisy Student Self-Training [J].
Dai, Yuanfei ;
Zhang, Bin ;
Wang, Shiping .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) :1697-1708
[28]   Prompt-Based Self-training Framework for Few-Shot Named Entity Recognition [J].
Huang, Ganghong ;
Zhong, Jiang ;
Wang, Chen ;
Dai, Qizhu ;
Li, Rongzhen .
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 :91-103
[29]   Prompt robust large language model for Chinese medical named entity recognition [J].
Chen, Yubo ;
Zhang, Baoli ;
Li, Sirui ;
Jin, Zhuoran ;
Cai, Zhengyuan ;
Wang, Yingzheng ;
Qiu, Delai ;
Liu, ShengPing ;
Zhao, Jun .
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (05)
[30]   Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning [J].
Zhou, Kang ;
Li, Yuepei ;
Li, Qi .
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, :7198-7211