Semi-supervised geological disasters named entity recognition using few labeled data

被引:8
|
作者
Lei, Xinya [1 ,2 ]
Song, Weijing [1 ,2 ]
Fan, Runyu [1 ,2 ]
Feng, Ruyi [1 ,2 ]
Wang, Lizhe [1 ,2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Geological disasters named entity recognition; Semi-supervised learning; Self-training; Pre-trained BERT model; Named entity recognition;
D O I
10.1007/s10707-022-00474-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The geological disasters Named Entity Recognition (NER) method aims to recognize entities reflecting disaster event information in unstructured texts to construct a geohazard knowledge graph that can provide a reference for disaster emergency response. Without training on large-scale labeled data, current NER methods based on deep learning models cannot identify specific geological disaster entities from geological disaster situation reports. However, manually labeling geohazard situation reports is tedious and time-consuming. As a result, we present Semi-GDNER, a semi-supervised geological disasters NER approach that can effectively extract six kinds of geological disaster entities when a few manually labeled and unlabeled in-domain data are available. It is divided into two stages: (1) transferring the parameters of the pre-trained BERT-base model to the BERT layer of the backbone model BERT-BiLSTM-CRF and training the backbone model with a few labeled data; (2) continuing training the backbone model by expanding the training set with unlabeled data using a self-training (ST) strategy. To reduce noise in the second stage, we select the pseudo-labeled samples with high confidence to join the training set in each ST iteration. Experiments on our constructed Geological Disaster NER data show that our approach achieves a higher F1 (0.88) than other NER approaches (including five supervised NER approaches and a semi-supervised NER approach using the ST strategy of expanding the training set with all pseudo-labeled data), demonstrating the effectiveness of our approach. Furthermore, experiments on four general Chinese NER datasets show that the framework of our approach is transferable.
引用
收藏
页码:263 / 288
页数:26
相关论文
共 50 条
  • [1] Semi-supervised geological disasters named entity recognition using few labeled data
    Xinya Lei
    Weijing Song
    Runyu Fan
    Ruyi Feng
    Lizhe Wang
    GeoInformatica, 2023, 27 : 263 - 288
  • [2] Semi-Supervised Learning for Named Entity Recognition Using Weakly Labeled Training Data
    Zafarian, Atefeh
    Rokni, Ali
    Khadivi, Shahram
    Ghiasifard, Sonia
    2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 129 - 135
  • [3] ROSE-NER: Robust Semi-supervised Named Entity Recognition on Insufficient Labeled Data
    Chen, Haiyan
    Yuan, Shuwei
    Zhang, Xiang
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 38 - 44
  • [4] SEMI-SUPERVISED HANDWRITTEN DIGIT RECOGNITION USING VERY FEW LABELED DATA
    Van Vaerenbergh, Steven
    Santamaria, Ignacio
    Barbano, Paolo Emilio
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2136 - 2139
  • [5] A Semi-Supervised Algorithm for Indonesian Named Entity Recognition
    Leonandya, Rezka Aufar
    Distiawan, Bayu
    Praptono, Nursidik Heru
    2015 3RD INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2015), 2015, : 45 - 50
  • [6] Named entity recognition: a semi-supervised learning approach
    Sintayehu H.
    Lehal G.S.
    International Journal of Information Technology, 2021, 13 (4) : 1659 - 1665
  • [7] Semi-supervised disentangled framework for transferable named entity recognition
    Hao, Zhifeng
    Lv, Di
    Li, Zijian
    Cai, Ruichu
    Wen, Wen
    Xu, Boyan
    NEURAL NETWORKS, 2021, 135 : 127 - 138
  • [8] Sentiment analysis using semi-supervised learning with few labeled data
    Pan, Yuhao
    Chen, Zhiqun
    Suzuki, Yoshimi
    Fukumoto, Fumiyo
    Nishizaki, Hiromitsu
    2020 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW 2020), 2020, : 231 - 234
  • [9] Semi-supervised named entity recognition in multi-level contexts
    Chen, Yubo
    Wu, Chuhan
    Qi, Tao
    Yuan, Zhigang
    Zhang, Yuesong
    Yang, Shuai
    Guan, Jian
    Sun, Donghong
    Huang, Yongfeng
    NEUROCOMPUTING, 2023, 520 : 194 - 204
  • [10] Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings
    Okur, Eda
    Demir, Hakan
    Ozgur, Arzucan
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 549 - 555