Named Entity Recognition in Government Audit Texts Based on ChineseBERT and Character-Word Fusion

被引:2
|
作者
Huang, Baohua [1 ]
Lin, Yunjie [1 ]
Pang, Si [1 ]
Fu, Long [1 ]
机构
[1] Guangxi Univ, Sch Comp Elect & Informat, Nanning 530004, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 04期
基金
中国国家自然科学基金;
关键词
smart audit; named entity recognition; character-word fusion; GHM loss function; ChineseBERT;
D O I
10.3390/app14041425
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Named entity recognition of government audit text is a key task of intelligent auditing. Aiming at the problems of scarcity of corpus in the field of governmental auditing, insufficient utilization of traditional character vector word-level information features, and insufficient capturing of auditing entity features, this study builds its own dataset in the field of auditing and proposes the model CW-CBGC for recognizing named entities in governmental auditing text based on ChineseBERT and character-word fusion. First, the ChineseBERT pre-training model is used to extract the character vector that integrates the features of glyph and pinyin, combining with word vectors dynamically constructed by the BERT pre-training model; then, the sequences of character-word fusion vectors are input into the bi-directional gated recurrent neural network (BiGRU) to learn the textual features. Finally, the global optimal sequence label is generated by Conditional Random Field (CRF), and the GHM classification loss function is used in the model training to solve the problem of error evaluation under the conditions of noisy entities and unbalanced number of entities. The F1 value of this study's model on the audit dataset is 97.23%, which is 3.64% higher than the baseline model's F1 value; the F1 value of the model on the public dataset Resume is 96.26%, which is 0.73-2.78% higher than the mainstream model. The experimental results show that the model proposed in this paper can effectively recognize the entities in government audit texts and has certain generalization ability.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Chinese Named Entity Recognition Based on Character-Word Vector Fusion
    Ye, Na
    Qin, Xin
    Dong, Lili
    Zhang, Xiang
    Sun, Kangkang
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [2] Chinese Named Entity Recognition with Character-Word Mixed Embedding
    Shijia, E.
    Xiang, Yang
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2055 - 2058
  • [3] Entity slot recognition based on data enhancement and character-word fusion features
    Liu Z.
    Xu M.
    Wang C.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50 (11): : 101 - 106
  • [4] A Chinese Named Entity Recognition Method Based on Fusion of Character and Word Features
    Chai, Wenguang
    Wang, Jiazhen
    2022 IEEE 14TH INTERNATIONAL CONFERENCE ON ADVANCED INFOCOMM TECHNOLOGY (ICAIT 2022), 2022, : 308 - 313
  • [5] Research on named entity recognition of chinese electronic medical records based on multi-head attention mechanism and character-word information fusion
    Zhang, Qinghui
    Wu, Meng
    Lv, Pengtao
    Zhang, Mengya
    Yang, Hongwei
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (04) : 4105 - 4116
  • [6] Resolving Entity Morphs based on Character-Word Embedding
    Sha, Ying
    Shi, Zhenhui
    Li, Rui
    Liang, Qi
    Wang, Bin
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 48 - 57
  • [7] Named Entity Recognition Model Based on the Fusion of Word Vectors and Category Vectors
    Zhou, Yang
    Zeng, Haoyang
    Zhang, Wei
    Zhang, Yuguang
    IEEE ACCESS, 2024, 12 : 194657 - 194668
  • [8] Kiwifruit Planting Entity Recognition Based on Character and Word Information Fusion
    Li S.
    Zhang M.
    Liu B.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (12): : 323 - 331
  • [9] Chinese Medical Entity Recognition Model Based on Character and Word Vector Fusion
    Zhang, Qinghui
    Hou, Lei
    Lv, Pengtao
    Zhang, Mengya
    Yang, Hongwei
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [10] Simultaneous Character-Cluster-Based Word Segmentation and Named Entity Recognition in Thai Language
    Tongtep, Nattapong
    Theeramunkong, Thanaruk
    KNOWLEDGE, INFORMATION, AND CREATIVITY SUPPORT SYSTEMS, 2011, 6746 : 216 - 225