Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

被引:0
作者
Meng, Yu [1 ]
Zhang, Yunyi [1 ]
Huang, Jiaxin [1 ]
Wang, Xuan [1 ]
Zhang, Yu [1 ]
Ji, Heng [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantlysupervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantlysupervised NER models by significant margins(1).
引用
收藏
页码:10367 / 10378
页数:12
相关论文
共 50 条
[31]   Variety-aware GAN and online learning augmented self-training model for knowledge graph entity alignment [J].
Qian, Ye ;
Pan, Li .
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
[32]   Semi-Supervised Learning for Named Entity Recognition Using Weakly Labeled Training Data [J].
Zafarian, Atefeh ;
Rokni, Ali ;
Khadivi, Shahram ;
Ghiasifard, Sonia .
2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, :129-135
[33]   Language model based on deep learning network for biomedical named entity recognition [J].
Hou, Guan ;
Jian, Yuhao ;
Zhao, Qingqing ;
Quan, Xiongwen ;
Zhang, Han .
METHODS, 2024, 226 :71-77
[34]   Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition [J].
Liu, Angli ;
Du, Jingfei ;
Stoyanov, Veselin .
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, :1142-1150
[35]   A Cybersecurity Named Entity Recognition Model Based on Active Learning and Self-learning [J].
Liu, Zhaoli ;
Jiang, Kun ;
Liu, Zheng ;
Qin, Tao .
PROCEEDINGS OF THE 36TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC 2024, 2024, :4505-4510
[36]   Revisiting Self-Training for Few-Shot Learning of Language Model [J].
Chen, Yiming ;
Zhang, Yan ;
Zhang, Chen ;
Lee, Grandee ;
Cheng, Ran ;
Li, Haizhou .
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, :9125-9135
[37]   Cross-subject human activity recognition based on self-training and self-supervised learning [J].
Zhang, Qi ;
Wei, Baichun ;
Yi, Chunzhi ;
Zhu, Haiqi ;
Ding, Qiang ;
Jiang, Feng .
COMPUTING, 2025, 107 (06)
[38]   Noise-Robust Vision-Language Pre-Training With Positive-Negative Learning [J].
Huang, Zhenyu ;
Yang, Mouxing ;
Xiao, Xinyan ;
Hu, Peng ;
Peng, Xi .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) :338-350
[39]   A CNN-Based Semi-supervised Self-training Method for Robust Underwater Fish Recognition [J].
Li, Tanqing ;
Zhao, Zhili ;
Zhang, Hengyu ;
Li, Kun ;
Lv, Wenjun .
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, :1553-1559
[40]   Language Model Pre-training Method in Machine Translation Based on Named Entity Recognition [J].
Li, Zhen ;
Qu, Dan ;
Xie, Chaojie ;
Zhang, Wenlin ;
Li, Yanxia .
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (7-8)