Constrained Labeled Data Generation for Low-Resource Named Entity Recognition

被引:0
|
作者
Guo, Ruohao [1 ]
Roth, Dan [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Penn, Philadelphia, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) in lowresource languages has been a long-standing challenge in NLP. Recent work has shown great progress in two directions: developing cross-lingual features/models to transfer knowledge to low-resource languages, and translating source-language training data into low-resource target-language training data by projecting annotations with cheap resources. We focus on the second direction in this study. Existing methods suffer from the low quality of the resulting annotated data in the target language; for example, they cannot handle word order and lexical ambiguity well. To handle these limitations we propose a novel approach that uses the projected annotation to generate pseudo supervised data with a transformer language model and a constrained beam search. This allows us to generate more diverse, higher quality, as well as higher quantities of annotated data in the target language. Experiments demonstrate that, when combining our method with available cross-lingual features, it achieves state-of-the-art or competitive performance on NER in a low-resource setting, especially for languages that are distant from our source language, English.(1)
引用
收藏
页码:4519 / 4533
页数:15
相关论文
共 50 条
  • [41] Soft-constrained inference for Named Entity Recognition
    Fersini, E.
    Messina, E.
    Felici, G.
    Roth, D.
    INFORMATION PROCESSING & MANAGEMENT, 2014, 50 (05) : 807 - 819
  • [42] Low-Resource Name Tagging Learned with Weakly Labeled Data
    Cao, Yixin
    Hu, Zikun
    Chua, Tat-Seng
    Liu, Zhiyuan
    Ji, Heng
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 261 - 270
  • [43] A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers
    Chaudhary, Aditi
    Xie, Jiateng
    Sheikh, Zaid
    Neubig, Graham
    Carbonell, Jaime G.
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5164 - 5174
  • [44] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [45] Construction of Machine-Labeled Data for Improving Named Entity Recognition by Transfer Learning
    Kim, Juae
    Ko, Youngjoong
    Seo, Jungyun
    IEEE ACCESS, 2020, 8 : 59684 - 59693
  • [46] Two-perspective Biomedical Named Entity Recognition with Weakly Labeled Data Correction
    Zhou, Huiwei
    Liu, Zhe
    Lang, Chengkun
    Xu, Yibin
    Du, Lei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 941 - 944
  • [47] PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition
    Zhang, Tao
    Xia, Congying
    Yu, Philip S.
    Liu, Zhiwei
    Zhao, Shu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5441 - 5451
  • [48] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
  • [49] Named Entity Recognition Networks Based on Syntactically Constrained Attention
    Sun, Weiwei
    Liu, Shengquan
    Liu, Yan
    Kong, Lingqi
    Jian, Zhaorui
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [50] Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers
    Lester, Brian
    Pressel, Daniel
    Hemmeter, Amy
    Choudhury, Sagnik Ray
    Bangalore, Srinivas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1841 - 1848