Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning

被引:0
作者
Zhou, Kang [1 ]
Li, Yuepei [1 ]
Li, Qi [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年
基金
美国食品与农业研究所;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods. Our code is available at Github(1).
引用
收藏
页码:7198 / 7211
页数:14
相关论文
共 32 条
  • [1] Learning from positive and unlabeled data: a survey
    Bekker, Jessa
    Davis, Jesse
    [J]. MACHINE LEARNING, 2020, 109 (04) : 719 - 760
  • [2] Bekker J, 2018, AAAI CONF ARTIF INTE, P2712
  • [3] Bishop C.M., 2006, MACH LEARN, V128, P9
  • [4] Chiu Jason PC, 2016, Transactions association for computational linguistics, V4, P357, DOI [DOI 10.1162/TACLA00104, DOI 10.1162/TACL_A_00104, 10.1162/tacl_a_00104]
  • [5] Devlin J., 2019, North American Chapter of the Association for Computational Linguistics, V1, P4171, DOI [DOI 10.48550/ARXIV.1810.04805, DOI 10.18653/V1/N19-1423, 10.48550/ARXIV.1810.04805]
  • [6] du Plessis MC, 2014, ADV NEUR IN, V27
  • [7] Class Prior Estimation from Positive and Unlabeled Data
    Du Plessis, Marthinus Christoffel
    Sugiyama, Masashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (05): : 1358 - 1362
  • [8] Elkan C., 2008, PROC 14 ACM SIGKDD I, P213
  • [9] Gabor K., 2018, P 12 INT WORKSHOP SE, P679, DOI [DOI 10.18653/V1/S18-1111, 10.18653/v1/S18-1111]
  • [10] Giannakopoulos Athanasios, 2017, P 8 WORKSHOP COMPUTA